1. ORIGINS OF MODERN ALGEBRA 


Modern algebra was developed to solve equations. 

The phrase “modern algebra” is a little vague, but it is commonly used to de- 
scribe the material that appeared in van der Waerden’s book Moderne Algebra that 
first appeared in 1930. Van der Waerden first encountered this material when he 
arrived at Gottingen in 1924. Among the primary developers of this material were 
Dedekind, Weber, Hilbert, Lasker, Macaulay, Steinitz, Noether, Artin, Krull, and 
Wedderburn, (on rings, ideals, and modules), Schur, Frobenius, Burnside, Schreier, 
and Galois (on groups and their representations). Van der Waerden had the advan- 
tage of attending lectures courses on algebra by Noether at Gottingen and Artin 
at Hamburg. 

Van der Waerden’s book is a marvel, as fresh today as when it was written. None 
of the hundreds of books covering similar ground written since casts the original 
into shadow. 

The two basic structures of modern algebra are groups and rings. 


2. FROM N TO Z TO Q TO Q, R AnD C 


I disagree with the following quotation: 


Die ganze Zahl schuf der liebe Gott, alles Übrige ist Menschenwerk. 
God created the integers, all else is the work of man. 
Kronecker 


Even the integers are the work of man. No doubt the first mathematical achieve- 
ment of man was to recognize when two non-empty sets had the same cardinality. 
Then came the abstraction, picking a single label, one, two, three, et cetera, to 
name/describe sets having the appropriate cardinality. Thus arose the natural 
numbers 1,2,3,.... 

There have been a number of primitive cultures which had no numbers beyond 
one, two, and three. Even cultures with more extended numbering systems have 
not always had a notion of zero. 

The creation of the natural numbers, indeed, of all mathematics, was motivated 
by man’s desire to understand and manipulate the world. Mathematics is a practical 
art. 

Many equations can be solved within the integers. One can postulate simple 
arithmetic problems arising from everyday life that can be solved within the inte- 
gers. A typical example might be find an integer x such that x +27 = 30. Ata 
slightly more sophisticated level, one can imagine simple division problems, such 
as find x such that 3x = 60, that can also be solved within the positive integers. 
However, a mild modification, such as 3x = 67, leads to the idea of division with 
remainder, and suggests how mankind was led to the rational numbers. 

One can also imagine the forces that prompted the notion of negative integers. 

The construction of the rationals Q from the integers Z can be formalized in such 
a way that a similar process applied to any domain produces its field of fractions 
(see section ??). The next result summarizes the utility of the rational numbers in 
terms of solving certain kinds of equations. Notice that the result holds true if any 
field is substituted for the rationals. 


Theorem 2.1. Jf a,b,c are rational numbers with a Æ 0, then there is a unique 
rational number x such that ax + b = c. 


After linear equations come quadratics. 
One of the great historical events concerning quadratics is Euclid’s famous proof 
that v2 is not rational. 


Theorem 2.2. There is no rational number whose square is two. 


Proof. Suppose to the contrary that x is a rational number such that z? = 2. Write 
x = a/b where a and b are integers. By cancelling common factors, we may assume 
that a and b have no common factor. Now, 2b? = a”, so 2 divides a?. Hence 2 
divides a, and we may write a = 2c. Hence 2b? = 4c”, and b? = 2c?. It follows that 
b?, and hence b, is even. Thus a and b are both divisible by 2. This contradicts the 
fact that a and b are relatively prime, so we conclude that 2 cannot be a square in 


Q. 


This result was no doubt motivated by the problem of computing the length of 
the hypotenuese of the isoceles right triangle with sides of length one. 

Let’s focus on the proof of this result. The key point is that every non-zero 
element of Q can be written as a/b with a and b relatively prime. This fact is 
a consequence of a still more elementary fact, which we summarize in the next 
theorem. 


Theorem 2.3. Every non-zero integer can be written in an essentially unique way 
as a product of primes, 


where pı,... ,Ðn are primes. 


By a prime we mean an integer p such that its only divisors are +1 and +p. Thus, 
the primes are {+2,+3,-+5,---}. When we say “essentially unique” we mean that 
factorizations 6 = 2.3 = 3.2 = (—3).(—1).2 = 1.(—2).3.(—1) are to be viewed as the 
same; they differ only by order and the inclusion of the terms +1. 

Two integers are relatively prime if the only numbers that divide both of them 
are +1. 

This theme, the unique factorization of integers and their relatives, reappeared 
often in the early development of modern algebra, and it remains a staple of intro- 
ductory algebra courses. 


That the Greek’s view of numbers and algebra was intimately connected to 
geometry is well documented. They had no problem accepting the existence of 
numbers of the form Vd with d rational because Pythagoras’s theorem showed that 
right-angle triangles in which the lengths of two sides were rational numbers led to 
the conclusion that the length of the third side was of the form Vd. Accepting such 
numbers on an (almost) equal footing with the rationals allowed the solution of a 
range of quadratic equations with rational coefficients. 

Thus, in modern parlance, the Greeks were quite happy computing in fields such 
as Q(Vd) when d is a positive rational number. 

Of course it is obvious that the equation x? = —1 has no solution in Q, but the 
reason that it has no solution is quite different than the reason that +? = 2 has 
no solution. One can imagine that the fact that 2? = —1 has no rational solution 
did not worry people much. It probably seemed a foolish waste of time to even 
consider that a problem. However, it is less apparent that an equation such as 
x? +22 +2 = 0 has no rational solution, and the discovery of this fact must surely 


have been intimately related to the discovery of the general solution to a quadratic 
equation. Several ancient cultures independently discovered the result that 


—b + V b2 — 4ac 
ga 2a 
gives the two solutions to the quadratic equation az? + br + c = 0. This formula 
gives a criterion that the quadratic has no solution (within the reals) if b? —4ac < 0. 

This, after many centuries, led to the invention/discovery of \/—1 and eventually 
to the notion of complex numbers. This in turn leads to the following question: if 
f(x) a polynomial with coefficients in a field k, is there a field K containing k in 
which f has a zero? We take up this question in section 6. 

Having discovered the above formula for the roots of a quadratic polynomial 
attention turned to the question of whether there are analogous formulas for the 
solutions to higher degree polynomials. Eventually, Galois gave a comprehensive 
solution to this problem, and we will encounter Galois theory later in this course. 

Once the ancients had realized that one could pass beyond the rationals Q to 
include roots of rational numbers and more complicated expressions built from such 
roots, it was natural to ask if this gave “all” numbers. This question is crystallized 
by asking whether m is the zero of a polynomial with rational coefficients. More 
generally, this leads the distinction between algebraic and transcendental elements 
over an arbitrary field. 


3. RINGS 


Definition 3.1. A ring is a non-empty set R endowed with two operations, addi- 
tion (denoted by +) and multiplication,(denoted by x or - or juxtaposition), and 
satisfying for all a,b,c € R: 
(1) a+beER; 
(2) a+(b+c) =(a+b)+c; 
(3) a+b=b+a; 
(4) R has a zero element, denoted 0, with the property that 0+ a=a+0=<a; 
(5) the equation a + x = 0 has a solution x € R; we write —a for x and call it 
the negative of a; 
(6) abe R; 
(7) a(bc) = (ab)c; 
(8) a(b+c) = ab + bc and (b+ c)a = ba + ca. 
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Conditions (1)-(5) say that (R, +) is an abelian group with identity 0. Notice 
we do not call 0 the identity element, but the zero element, of R. Condition (8) 
connects the two different operations + and x. Conditions (6) and (7) are analogues 
of conditions (1) and (2), but there are no analogues of conditions 3, 4, and 5, for 
multiplication. Rings in which analogues of those conditions hold are given special 
names. 

The smallest ring is that consisting of just one element 0; we call it the trivial 
ring. 

One can use the distributive law to show that 0.0 = 0 and, more generally, that 
a.0 = 0 for alla E€ R. 


Definition 3.2. We say that R is a ring with identity if there is an element 1 € R 
such that 1.a = a.1 = a for alla € R. We call 1 the identity element. © 


It is easy to show that a ring can have at most one identity element. 

If R is not the trivial ring and has an identity, then 1 Æ 0; it might be easier to 
show that if R has an identity and 1 = 0, then R is trivial. We will often assume 
that 1 4 0; this simply means that R Æ {0}. 

Convention. All the rings in this course will have an identity element. Most 
rings one encounters in algebra do have an identity. This is not so in analysis; if X 
is a non-compact Hausdorff space, the ring of continuous R-valued functions on X 
that vanish at infinity does not have an identity. 


Definition 3.3. A ring R is commutative if ab = ba for all a,b € R. ©) 


The rings you are most familiar with, namely Z,Q,R, and C, are commutative 
and have an identity. As the next example shows, many important rings are not 
commutative. 


Example 3.4. Let S be a ring. We define M,,(S), the ring of n x n matrices with 
entries in S as follows. As a set it consists of all matrices 


S11 $12 Sin 
$21 $22 S2n 
Sn1 Sn2 Snn 


where the individual entries s;; are elements of S. 

The addition on M,,(S) is induced by that on S. If a = (sij) and b = (tij) are 
in M,(S), we define 

at b:= (sig + tij), 

the matrix whose ij'® entry is s;; + tij, the sum of the ij‘ entries of a and b. 

You should check that this makes M,,(S) an abelian group. Indeed, as a group, 
M,,(S) is isomorphic to S x --. x S, the product of n? copies of S. 

The multiplication in M, (S), called matrix multiplication, is defined by 


(sij)-(tig) = & satu) 


That is, the ijt? entry in ab is the dot product of the i*® row of a with the jt? 
column of b. 

It is rather tedious to show that this multiplication makes M, (S) a ring. The 
zero element in M,,(5') is the matrix with all its entries equal to zero. 

If S has an identity and S 4 0, then M,,(S) is not commutative for n > 2; for 


example, if 
0 1 1 0 
ea =O) 


then ab = 0 F ba. © 


Convention. All the rings in this course will be commutative. I will therefore 
make definitions that are appropriate for commutative rings. Whenever I say “ring” 
I mean “commutative ring’. 


Definition 3.5. Let R be a commutative ring with identity. An element a € R is 
called a unit if the equation az = 1 has a solution in R. Such a solution is unique 
and is called the inverse of a and is denoted by a™t. © 


Let’s check that the inverse is indeed unique: if ab = ac = 1, then 
b = 6.1 = b(ac) = (ba)c = (ab)c = 1.c = c. 


Example 3.6. Let V be an infinite dimensional vector space. There are linear 
maps u: V > V andv: V —> V such that uv = 1, but vu Æ 1. Here 1 denotes the 
identity map. © 
Definition 3.7. A non-zero element a in a ring R is a zero-divisor if there is a non- 
zero element 6 such that ab = 0. A ring without zero-divisors is called a domain. 


© 


In other words, a ring is a domain if and only if every product of non-zero 
elements in non-zero. 


Lemma 3.8. Let R be a commutative ring. Then R is a domain if and only if we 
can cancel in the following sense: whenever 0 £a E€ R and ab = ac, then b = c. 


Proof. 


Recall that for any group G and any element g € G, there is a unique group 
homomorphism ¢ : Z — G such that ¢(1) = g. 

In particular, if R is a ring with identity, there is a unique group homomorphism 
ọ : Z — (R,+) such that ¢(1) = 1. Warning: the two 1s in this equation are 
different—the first is the 1 € Z and the second is the 1 € R. 

We will write n for ¢(n), but of course you must then be careful when you see n 
to know which n you mean! 

Subrings. A subring of a ring R is a subset S which is closed under addition 
and subtraction and multiplication, and contains 1p. 


Example 3.9. Let d be an integer that is not a square. We define 
Z[Vd| = {a + bvd | a,b € Z}. 
This is a subset of C and is closed under multiplication, addition, and subtraction, 


meaning that the product, sum, and difference, of two elements in Z[vd] belongs 
to Z[Vd]. Hence Z[Vd] is a ring, a subring of C. Q 
The product R x S of two rings. The Cartesian product 

Rx S := {(r,s)|r € R,s € S} 
of rings R and S' can be given the structure of a ring by declaring 

(r,s) + (r,s) := (r+ r',s +83’) 

(r, s).(7’, 8’) := (rr’, 88’). 

We leave the reader to check the details. Of course, you already checked in Math 


402 that (R x S,+) is an abelian group. The zero element is (0,0), and the identity 
is (1,1). 


Some Exercises. 
In all these exercises, the elements a,b,c... belong to a commutative ring R. 
(1) Use the distributive law to show that a.0 = 0 for alla € R. 
(2) Show that a ring can have at most one identity element. 
(3) Let R be a ring with identity. Show that R is the trivial ring (i.e., consists 
only of 0) if and only if 1 = 0. 


(4) Let R be a ring. In the abelian group (#,+) we denote the inverse of a 
by —a; thus a + (—a) = (—a) + a = 0. Of course we write b — a to mean 
b + (—a). Show that this minus sign has the following properties: 

(a) a.(—b) = (—b).a = —(ab); 

(b) (—a).(—b) = ab; 

(c) (-1).a = —a. 

) Show that a finite commutative domain is a field. 

(6) Let 6 : Z — (R,+) be the group homomorphism defined by #(1) = 1. 
Show that ¢(nm) = ¢(n)o(m) for all n,m € Z. Be careful when n or m is 
negative. 

(7) Let n be a positive integer. Show that n.a, the product in R of n, the image 
of 1 € Z under the homomorphism ¢ : Z — (R,+) defined by (1) = 1, is 
equal to a+---+ a, the sum of a with itself n times. 

(8) The rings Z, with p prime are NOT the only finite fields. For example, 
there is a field with 4 elements. Write out the addition and multiplication 
tables for a field with 4 elements. Denote the field by F. It must contain 0 
and 1 and some element, say a, that is not equal to zero and 1. It follows 
that F = {0,1,a,a+1}—why? Write down the addition and multiplication 
tables, explaining how you get the entries. 


4. FINITE FIELDS 
Finite fields play a central role in number theory, and in applications of algebra 
to communications, coding theory, and several other computer-related areas. 


The cyclic groups Z, = Z/nZ may be given the structure of a ring. Just as the 
addition on Z induced the addition on Z,,, so does the multiplication on Z induce 
a multiplication on Zn. 


Lemma 4.1. Letn be an integer and Zn = Z/nZ the quotient group. Then Zn 
becomes a commutative ring with identity [1] under the multiplication defined by 


[a] - [b] := [ab]. 


Proof. 


We will tend to write a, or G, for [a], hoping that the context will always make 
the notation unambiguous. 

Recall that the map ¢: Z —> Zn, ọla) = āū, is a group homomorphism. It 
also satisfies d(ab) = ¢(a)d(b); i.e., ọ ‘respects’, or is ‘compatible with’, both 
the additive and multiplicative structures in Z and Zņ; this says that ¢ is a ring 
homomorphism (see Definition 7.1 below). 


Lemma 4.2. Let a and n be integers. Then (a,n) = 1 if and only if the equation 
ax = 1 has a solution in Zn. 


Proof. 


Theorem 4.3. Zn is a field if and only if n is prime. 


Proof. (<=) If n is prime and 0 Æ [a] € Zn, then n does not divide a, so (a,n) = 1. 

By Lemma 4.2, there is an element x € Zn such that az = 1. Hence Zp is a field. 
(=) We will prove this by contradiction. Suppose n is not prime. Then n = ab 

with {a,b} {1,-1} = ¢. In particular, n does not divide a, so a is a non-zero 
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element of Zn. If a had an inverse, say xa = 1, in Zn, then we would have the 
following in Zp: 


b = 1.b = (xa)b = x(ab) = 0. 
But this implies that n divides b, whence a must be +1; this is a contradiction, so 
we conclude that a cannot have an inverse in Zn. 


Notation. If p is a positive prime integer, we write F, for the field with p 
elements. In other words, Fp = Zp. Later on we shall see that there is a finite field 
with p” elements and we will denote this by Fp». These are all the finite fields. 


Inverses in F,,. Consider the problem of explicitly finding the inverse of an 
element in Fp. For example, what is the inverse of 13 in Fı9? Since 19 is prime, 
the greatest common divisor of 13 and 19 is 1, and there are integers a and b such 
that 1 = 13a + 19b. The image of a in Fig is the inverse of 13. To find a and b we 
apply the Euclidean algorithm to get 


19=1x134+6, 13=2x641, 
so 
1=13— 2 x 6 = 13 — 2 x (19 — 13) = 3 x 13 — 2 x 19. 
Hence 137! = 3 in F19. Check that 13 x 3 =39=2 x 19+ 1. 


5. OTHER FIELDS 


It would be foolish to develop a theory of fields if those in Theorem 4.3 were the 
only examples. Fields abound. The simplest examples beyond those you already 
know are those in the next example. 


Example 5.1. Let d be a rational number that it is not a square in Q. The subset 
Q(vd) := {a+ bVd| a,b € Q} 
of C is closed under multiplication and addition, meaning that the product and 
sum of two in Q(Vd) belong to Q(V4d), so is a subring of C. The inverse (in C) of 
a non-zero element of Q(Vd) belongs to Q(Vd), namely 
=i a b 
bvd) = d; 
(a+ vd) (pba apt 


the denominator is non-zero because d is not a square in Q. Thus Q(vd) is a field. 


© 


Exercise. Let n be a positive integer and ¢ = e?7'/". Show that 
Q(¢) := {ao +a +--+ an1"! | ao, ... ,an—1 € Q} 
is a subfield of C. 


Exercise. Think of six interesting questions about the fields Fp, Q(Vd), and 
QCC). 

Later, we will examine fields in some detail, but for now we simply introduce 
them as a necessary preliminary for our discussion of polynomials. Fields provide 
the coefficients for polynomials. 

The letter k is often used to denote a field because German mathematicians, who 
were the first to examine fields in some detail, called a field ein körper (körper=body, 
cf. “corpse” ). Despite this nomenclature, the study of fields remains a lively topic. 


There are finite fields Fp» with p” elements for every prime p and every integer 
n > 1. Here, for example, is how to construct F4 with your bare hands. 

Construction of F,. First, F4 contains a zero and an identity and, because it 
has four elements, an element different from both zero and one that we will call a. 

We can add in F4, so F4 contains an element a+ 1. We will now show that 
a+1¢ {0,1,a}. To do this we first show that 1+1 = 0 in F4. To see this, observe 
that (F4,+) is a group with four elements, so every element in it has order dividing 
4; in particular, 


0=141414+1=(14+1).4+1 =(14+1), 


but F; is a field, so 1 +1 = 0. We can also write this as —1 = 1 in Fy. 

If a+ 1 = 0, then adding 1 to both sides gives a = 1, a contradiction; if 
a+1=1, then adding 1 to both sides gives a = 0, a contradiction; if a + 1 =a, 
then subtracting a from both sides gives 1 = 0, a contradiction. We conclude that 
a+1¢ {0,1,a}, and hence 


F; = {0,1,a,a + 1}. 


We have already done most of the work to construct the addition table; the only 
other calculation that needs to be done is 


a+a= la+ l.a = (1+ 1).a= 0.a = 0. 


The essential calculation needed to construct the multiplication table for F4 is 
to determine a?. Since F; is a field a? 40. If a? = 1, then 


0=0%-1=(a+1)(a—-1) = (e@4+1)(a+ 0), 


and this cannot happen because F; is a domain and a+ 1 is not zero. If a? = a, 
then 


0=a?—a=a(a—1)=a(a+t+1), 
and this cannot happen because F; is a domain. The only possibility is that a? = 


a+ 1. It is now easy to write out the multiplication table. 


6. THE POLYNOMIAL RING IN ONE VARIABLE 


Throughout this section k denotes a field. 

Let R be a commutative ring. To begin with you might think of R being the 
integers, or the rationals, the reals, or some other field you know and love. Poly- 
nomials in one variable, say x, with coefficients in R can be added and multiplied 
in the obvious way to produce another polynomial with coefficients in R. 

We write R[z] for the set of all polynomials in x with coefficients in R. An 
element of R[x] is an expression 


Ana” + dna" 1 +--+ + aye + ao 


where the coefficients a; belong to R. Addition and multiplication are defined in 
the obvious way. Two polynomials are considered to be the same only if all their 
coefficients are the same. In this way R[x] becomes a ring, with zero element the 
zero polynomial 0, and identity element the constant polynomial 1. 


Definition 6.1. Let R be a ring. The polynomial ring with coefficients in R, which 
we denote by R[x], consists of all formal expressions 


24 


Qo + ait + axl +... + anr” 


where Qo,... ,@n E€ R, and this is made into a ring by defining the sum and product 
of two polynomials by 


SS aya’ + 5 Bizi := X (a: + Bz" 


and 
n 
(Hae!) (Lae!) =O (Let). 
n g= 
We call ao,... ,@n the coefficients of S*""_) aix’. We say that two polynomials are 
equal if and only if they have the same coefficients. 
We call x an indeterminate. © 


We leave it to the reader to check that R[x] is a ring. 


We are particularly interested in the case when R is a field. 

The ring of polynomials in one variable with coefficients in a field behaves in 
many respects like the ring of integers. We will see this when we consider questions 
of division and factorization. 

Recall that if a and b are integers with b non-zero, then there are integers q and 
r such that a = bqg+r and 0 < r < |b|. We usually call r the remainder. This result 
plays a key role in arithmetic. To show that there is an analogous result for k[2] 
we need a notion of “size” to replace absolute value. 

The degree of a non-zero element f = a,2"+---+a 2+ ao in R[z] is n provided 
that a, # 0. In that case we call a, the leading coefficient of f. If f = 0 it is 
convenient to define its degree to be —oo. It is a trivial observation that the units 
in k[a] are precisely the polynomials of degree zero. 


Lemma 6.2. Let R be a domain and let f,g € R[x]. Then 


(1) deg(f +g) < max{deg f, deg g}; 
(2) deg( fg) = deg f + deg g; 
(3) Ria] is a domain. 


More variables. It is clear that we can jazz up this definition and define for 
any positive integer n the polynomial ring in n variables, k[z1,... , £n]. The rings 
k[a,y] and kia,y,z], or perhaps R[x, y] and Riz, y, z], should not cause too much 
fear. Just add and multiply polynomials in the way you have been doing for years. 


7. RING HOMOMORPHISMS AND IDEALS 


As with any collection of mathematical objects, we must specify the allowable 
maps R — S between two rings. These are the ring homomorphisms. Roughly, 
a ring homomorphism is a map between rings that “respects” the addition and 
multiplication operations in them. We also have a notion of kernel (those elements 
sent to zero) and image; the kernel of a homomorphism has certain properties 
which lead to the definition of a two-sided ideal—the kernel is a two-sided ideal. 
The image of a homomorphism f : R — S is itself a ring, a subring of S, and there 
is an isomorphism R/ker f S im(f). We have the notion of an ideal “generated” 
by a set of elements (the smallest ideal containing those elements, and this makes 
sense because an intersection of ideals is an ideal); and we also have the notion of 
the subring generated by a set of elements, which is the smallest subring containing 
them. 
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Definition 7.1. A homomorphism f : R — S of rings is a map such that f(xy) = 
f(x) f(y) and f(x +y) = f(x) + f(y) for all z,y € R, and f(1r) = 1s. If f isa 
bijective ring homomorphism we call it an isomorphism, and say that R is isomorphic 
to S, and denote this by R & S. In this case f7! is a ring homomorphism too. © 


The image of a ring homomorphism ¢: R — S is a subring of S. 

If R is a subring of a ring S the inclusion R — S is a ring homomorphism. For 
example, the inclusions Z > Q — R > C —> C{z] are all ring homomorphisms. 

A composition of ring homomorphisms is a ring homomorphism. 


Example 7.2. If R is any ring with identity there is a unique ring homomorphism 
@:Z— R. We must have ¢(1) = 1 because that is a requirement of every ring 
homomorphism; it follows that if n is a positive integer, then 


b(n) = G+ ---+1) = 9(1) +--+ 4) 14-41 <0, 


where there are n terms in each of these sums, and 0 = ¢(n—n) = 6(n) + d(—n) = 
n+ ¢(—n), so ¢(—n) = —n. Notice that we have taken the liberty of writing n for 
the element in R that is the sum of 1p with itself n times. Hopefully, this will not 
cause confusion! Of course, ¢ sends —n to (—1) + (—1) +---+(-1), the sum of —1 
taken n times. Hence if x € Rand n € Z we often write ng for x+---+ 2, the sum 
of x taken n times (if n > 0); this is the product of x and n, where n is viewed as 
an element of R via œ. © 


You have been working with polynomials for many years: when you plug a value 
into a polynomial you are applying a homomorphism. 


Example 7.3. Let k be a field and R a larger commutative ring containing k. 
Fix some element A € R. Then each polynomial in k[a] can be evaluated at A, by 
plugging in A; that is, every time you see x replace it by À and then evaluate the 
resulting expression in R to get an element f(A) in R. 

The rule e : k[z] — R defined by 


e(f) = FO) 
is a ring homomorphism. Explicitly, if f(x) = ao + aia +--+- + anz”, then 
(Naf =u sehr oa 


You should check that £ is a ring homomorphism: this is very easy because it simply 
says something you have known for may years, namely (f+g)(A) = f(A)+g(A) and 
(fg)(A) = f(A)gQ). All this is no accident; evaluating polynomials had been going 
on for many centuries before the abstract notions of rings and homomorphisms were 
introduced, and those notions were introduced so as to formalize and make precise 
what had long been going on. © 


The image of the homomorphism ¢ : k[x] — R is denoted by k[A]. 


Example 7.4. Complex conjugation, z > Z, is an isomorphism ġọ : C — C. 
When you first met complex conjugation you will have checked that wz = wz and 
w+z=w+zZ. In other words, you checked then that complex conjugation is a 
ring homomorphism. Q 


Example 7.5. The map ¢: F4 — F; defined by ¢(a) = a + 1 is a ring homomor- 
phism. You should check this by using the addition and multiplication tables for 
Fy. > 
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Remark. When defining a ring homomorphism ¢: R — S, we are often lazy: 
we don’t always give an explicit formula for ¢(r) for every r € R. For example, we 
might define ¢ : Q[z] — R by saying that ¢ is the ring homomorphism defined by 
o(x) = VT. What we are really saying is that there is a unique ring homomorphism 
$ : Q(z] — R such that ¢(x) = V7, and that it is then routine to figure out what 
o( f) is for every f € Q[z]. 

Because ¢ is a ring homomorphism, (ao + aia +--+: + anz”) must equal 

plao) + Par) O(x) + +++ + (An) O(a)”. 
Now, (x)? = (V7)', so we only need to know what ¢(a) is for a € Q. The 
restriction of ¢ to Z C Q C Q|z] is a ring homomorphism Z — R; but, as discussed 
in Example 7.2, there is only one ring homomorphism Z — R, and this is the 
inclusion. Thus ¢(n) = n for all n € Z. If n is a non-zero integer, then 


1 = $(1) = (n.n?) = n.9(n™), 
so ¢(n—!) =n“, and 6(m/n) = 6(m)¢(n-!) = mn, so 
plao + ant +++ + Anz") = ao +a VT +--+ anl VT”. 
The same sort of laziness is employed in defining the homomorphism ¢ in Ex- 


ample 7.4. 


Example 7.6. If m and n are relatively prime positive integers, there is an iso- 
morphism of rings 

Lippy = Lis X Dives 
To see this, define 6: Zmn — Zm X Zn by 


gla + (mn)] = ([a + (m)], [a + (n))). 
First, ¢ is well-defined because if [a+ (mn)] = [b+ (mn)], then a —b is a multiple of 
mn, hence a multiple of both m and n, so [a+(m)] = [b+(m)] and [a+(n)] = [b+(n)]. 
It is easy to check that ¢ is a ring homomorphism (you should do it even/especially 
if I don’t). We now claim that ¢ is an isomorphism; to check this we must show it is 
bijective. Both Zmn and Zm x Zn have mn elements, so it suffices to show that ¢ is 
injective. If d[a+ (mn)]| = ¢[b+ (mn)], then a— is divisible by both m and n, and 
hence by their product because gcd(m,n) = 1; it follows that [a+ (m)] = [b+ (m)], 
thus showing that ¢ is injective and hence an isomorphism. 


7.1. Ideals. 


Definition 7.7. An ideal of a ring R is a subset J which is a subgroup under +, and 
contains ar whenever r € Randa €T. 

If A is a subset of R we define the ideal, generated by A as the smallest ideal 
containing A. ©) 


Notation. Let a € R. The ideal generated by a is 
Ra := {ra |r € R}. 


We call Ra the principal ideal generated by a, and sometimes denote it by (a). 
It is easy to verify that the ideal generated by aj,--- ,an € Ris 


Ra, +--++ Ran := {r101 +--+ + fnan | T1,- Tn E R}. 
We sometimes denote this ideal by (a1,... , an). 


Basic results. The ring R itself is an ideal, and so is 0 = {0}. 
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If an ideal J contains a unit, say u, then J = R because if r € R, then r = u.u™tr 
is a multiple of an element in J so belongs to J. It follows that the only ideals of 
a field are the zero ideal and the field itself. The converse is also true: if R has no 
ideals other than zero and itself and 0 4 u € R, then uR is an ideal so must equal 
R, whence 1 = uv for some v € R, so u is a unit. 

You should check that if J and J are ideals, so is their intersection I N J, and 
their sum 

I+J:={i+j|iel,j€ J}, 
and their product 
LJ {iji ++ injn | Tiisa RS el, Gry. ign € J}. 
Definition 7.8. Let 6: R — S be a ring homomorphism. The kernel of ¢ is 
ker @ := {a € R | O(a) = 0}. 
Q 


Proposition 7.9. The kernel of a ring homomorphism ¢: R — S is an ideal of 
R. 


Proof. Since ¢ is a group homomorphism (R,+) — (S,+), ker@ is certainly a 
subgroup of (R,+). Moreover if a € ker ġ and r € R, then d(ar) = ¢(a)d(r) and 
this is zero because ¢(a) = 0. 


Exercise: A ring homomorphism ¢: R — S is injective if and only if ker ọ = 


{0}. 


The quotient by an ideal. If I is a two-sided ideal of R, we write R/I for the 
set of cosets 
[e +1]:= {x +a |ace T}. 
Thus R/I has the same meaning as it did in group theory, and R/I becomes an 
abelian group under the induced addition. In fact, R/I can be given the structure 
of a ring by defining 


+++ =+] and [e+ = feyt T, 
for all x,y € R. One must check that these definitions are unambiguous, and that 


they do make R/I a ring. The zero element in R/I is [0 + I] = I, and the identity 
is [1 + 7]. If R is commutative, so is R/I. 


Proposition 7.10. The map R— R/I defined by «+> [x +I] is a surjective ring 
homomorphism with kernel I. 


Proof. Left to the reader. This is easy, but you should prove it once in your life 
in order to understand why it is easy! The point is to see how the definition of 
addition and multiplication in R/I synchronizes with the axioms for a map to be 
a ring homomorphism. 


We will make frequent use of the previous result, and especially its companion, 
Proposition 7.11 below, which applies this result to quotients of polynomial rings. 
Let k bea field and I an ideal of k[x]. Suppose that I 4 k[a], so k[x]/I is not the 
zero ring. The inclusion k — k[a] composed with the homomorphism k{a] > k[a]/J, 
a |> [a+ I], gives a homomorphism w : k > k{a]/I. Since I N k = {0}, the kernel 
of w is zero, whence w is injective. Because the map w : k — kļ|x]/I is injective 
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we may identify k with its image in k[x]/I. We often do this. For example, this is 
what we do in the next result: to make sense of the statement of Proposition 7.11 
we must view k as a subring of k[z]/(f). 


Proposition 7.11. Let k be a field and f € kia] be a polynomial of degree n > 1. 
Write z for [x +(f)], the coset containing x and view & as an element of k[x]/(f). 
Then every element of k[a]/(f) can be written as 


No + AB + +++ + Anag! 
for unique elements 1, À1,... , An—1 E K. 


Proof. We need to use Proposition 11.1 to prove this: given g € k[a] we can write 
g=qf+r for some q,r € k[x] with degr < n. You can either take this on faith or 
look ahead to page 23. 

Obviously [g + (f)] = [r + (f)], so 


kla]/(f) = ilg + A] | g € klel} = {fr + A] | degr < n}. 


If r= ào + AH +- + An_12"71, then 
[r + (f)] = Do + Are +++ An- + (f)] 
= [Ao + (F) + Pit (Ple E (A) H Ani H Ale + AI" 
= o HAT H e H Anagr 


The uniqueness is because if 


No + ALE H + Anaa = po + pa H: + pnt 


then (Ap — uo) + (Ay — i)e +++ + (An—1 — Mn—1)2""' is in (f). But the only 
polynomial of degree < n that is a multiple of f is the zero polynomial. 


Proposition 7.12. Let 6: S — R be a ring homomophism. There is an isomor- 
phism of rings 


S/ker ¢ = im(¢). 


Proof. Write I = ker ¢. Since ¢ is a group homomorphism, the proof of the anal- 
ogous result for groups already shows that the map 6 : R/I — imọ defined by 
0([x + I]) = (x) is an isomorphism of abelian groups. So all that remains is 
to check that 0(x)O(y) = @(xy), but this follows at once from the definition of 
multiplication in R/T. 


Exercise. Let 0 : R — S be a ring homomorphism and J an ideal of R such 
that O(I) = 0. Let m : R — R/I be the natural map. Show there is a unique 
homomorphism ¢: R/I — S such that 6 = dr. 


Example 7.13. If A € k, then k[a]/(a — A) = k. To see this, let ¢ : ka] > k 
be the homomorphism given by plugging in A; that is, é(f) = f(A). Clearly ¢ is 
surjective—if a € k, then a = ¢(a)! If f is a multiple of x — A, then ¢(f) = 0, so 
ker @ D (cw — A). However, a polynomial f can be written as f = (x — A)q + f(A) for 
a suitable q € k[a], so we see that (f) 4 0 if f is not a multiple of x — A. Hence 
ker 6 = (a — A). The isomorphism k|zx]/(x — A) & k now follows from Proposition 
7.12. Q 
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The kernel of a homomorphism ¢ : R — S provides a precise measure of the 
(lack of) injectivity of ¢—¢ọ is injective if and only if ker ¢ = {0}; more explicitly, 
since ¢(a) = ¢(b) if and only if a — b € ker ¢, one sees that ¢(a) = ¢(b) if and only 
if [a + ker ¢] = [b + ker ¢]. It follows that the fibers of ¢, that is the subsets 

@*(s):={rER| d(r) =s} CR, seéS, 
are either empty or cosets of ker @. 


Lemma 7.14. If J is an ideal in R containing an ideal I, then J/I is an ideal of 
R/I and every ideal of R/I is obtained in this way. 


Proof. See the homework exercises on page 22. 
7.2. Maximal ideals and fields. 
Lemma 7.15. Letu E€ R. Then u is a unit if and only if (u) = R. 


Proof. (=) If u is a unit, there is an element v € R such that 1 = wv; thus r = uvr 
for r € R; in other words, every element of R is a multiple of u, which says that 
R= (u). 

(<) If R = (u), then every element of R is a multiple of u. In particular, 1 is, 
so 1 = uv for some v € R, thus showing that v is a unit. 


Lemma 7.16. A commutative ring R is a field if and only if its only ideals are 0 
and R itself. 


Proof. (=) If I is a non-zero ideal of R it contains a non-zero element, say u. But 

u is a unit by the hypothesis, so r = ruu~! € I for every r € R. That is, J = R. 
(<) Let u be a non-zero element of R. Then the ideal (u) is non-zero, so is equal 

to R by hypothesis; Lemma 7.15 now implies that u is a unit. 


An ideal J in a ring R is maximal if the only ideal that contains J but is not 
equal to it is R itself. 

One sees easily that (x — A) is a maximal ideal of k[x], but in general there will 
be other maximal ideals—that is the case when k is not algebraically closed (see 
below). 


Lemma 7.17. An ideal I in a ring R is maximal if and only if R/I is a field. 


Proof. This follows immediately from Lemma 7.14, but here is an alternative proof. 

Suppose that I is maximal. A non-zero element of R/I can be written as [a+ J] 
for some a ¢ I. Since I is maximal aR + I = R. Hence there are elements b € R 
and c € I such that 1 = ab + c. In R/I, 


[a + I][b + 7] = [ab + 1] = [1 — c+ 1] = [1 + 1] = 1eyr. 
Hence [b + J] is the inverse in R/I of [a + I]. This shows that R/T is a field. 
Conversely, suppose that R/I is a field. Let J be an ideal of R that is strictly 
larger than I. There is an element a € J\I. Since [a + I] is a non-zero element of 
R/T, it has an inverse, say [b+ I]. Since 
l—abeTl,andleaR+IcJ. Hence J = R, showing that I is maximal. 


Example 7.18. R[z]/(x? +1) S C. Q 


7.3. Ideals in Z. 
Proposition 7.19. The ideals in Z are nZ, n > 0. 


Proof. First observe that each nZ is an ideal of Z. On the other hand, we showed 
in Math 402 that the only subgroups of (Z,+) are the various nZ, and since an 
ideal is first a subgroup, the result follows. 


Here is a cute observation regarding the notation (a,b) in the ring Z. We use this 
notation for two different things: it denotes the ideal generated by a and b, namely 
aZ + bZ, and it denotes the greatest common divisor of a and b. Now Proposition 
7.19 tells us that the ideal (a,b) is equal to dZ, or (d), for some integer d; it turns 
out that we can take d to be the greatest common divisor of a and b so the two 
notations have (almost) the same meaning after all. 

To prove this claim, let d denote the gcd of two non-zero integers a and b. To 
see that (a,b) = (d) notice first that (a) C (d) because d|a and (b) C (d) because 
d|b so, as ideals are closed under +, (d) D (a) + (b) = (a,b). The reverse inclusion 
also holds because there are integers u and v such that d = au + bv, and hence 
d € (a,b) and hence (d) C (a,b). 


8. SOME EXAMPLES 


There is nothing difficult about the examples in this section. Mostly it is a 
matter of becoming familiar with the notation, and ideas, and that does take some 
time. Practice, practice, practice! 

The ideal (x? + 1,27 — 2) in Q[z] is equal to Q[z]. Write J = (x? +1, x? — 2). 
Recall that (a,b) denotes the ideal of a ring R consisting of the elements {ar + 
bs | r,s € R}. Dividing z? + 1 by x? — 2 and finding the remainder gives 

r? +1=2(2? —2)+2¢+1, 
which can be rewritten as 2x + 1 = (x? + 1).1 + (2? — 2).(—2), so 2x +1 € I. 
Dividing x? — 2 by 2x + 1 and finding the remainder gives 


z? —2= (2x +1)¢(2x —1)— f, 


from which it follows that 4 € J, and hence 4.4 = 1 € J. But once 1 € J, so is r.1 


for every r € Q[z], so I = Qfz]. 

There is an isomorphism of rings C[z]/(z? — 1) = C x C. Almost always 
when one wants to establish an isomorphism of the form S/I S R, one does so by 
finding a surjective ring homomorphism ¢: S — R such that ker ¢@ = I and then 
invoking Proposition 7.12. That is what we do here. 

Define ¢ : C[a] — C x C by 


F) = (FC), F(-1)). 
It is elementary to show that ¢ is a homomorphism: recall from Example 7.3 that 
” plugging in” is a ring homomorphism; you should check this! To see that ¢ is 
surjective observe that 


(a, 0) = 6( da(e +1) - 8-10); 


Now 


f cker?Ss0=¢(f) S F) = f(-1) =0 & a? — 1 divides f, 


16 


so ker f = (x? — 1). 

The same sort of argument will show that R[z]/(a?—1) = Rx R and Q[2]/(x? — 
1) = QxQ. However, the argument would not show that Z[2]/(x?—1) is isomorphic 
to Z x Z, so let’s ask the question: is Z[x]/(a? — 1) isomorphic to Z x Z? Hint: the 
element f = (1,0) in Z x Z satisfies f? = f; is there an element in Z[{x]/(a? — 1) 
that is equal to its own square? 


If d € Q is not a square, then Q{z]/(xz? — d) = Q(Vd). Define ¢ : Q[z] — 
Q(Vd) by (f) := f(Vd). This is surjective (why?) and has kernel equal to (a? —d) 
(why?). 


R{a]/(2? + x + 3) S C To prove this by invoking Proposition 7.12 we first need 
a homomorphism ¢ : R[x] — C whose kernel is (x? + x + 3). If we write a = 4(z), 
then a must satisfy a? + a +3 = 6(a?+2+3) = 0 in C. So, let a be one of the 
complex zeroes of x? + x +3 and define ¢ by 


Then ker ¢ = (a? + x + 3) as required. 


Folx]/(x? + x + 1) S F4 


Some Exercises. 
In all these exercises, the elements a,b,c... belong to a commutative ring R. 


(1) If a = bc + d show that (a,c) = (d,c). 

(2) Show that Z|z]/(x) S Z. (Hint: in this and the next exercise, use Proposi- 
tion 7.12.) 

(3) Show that Z[z]/(n) = Zn[x] for every integer n > 0. 

(4) We define a I of Z|z] as follows: f is in I if and only if the sum of its 
coefficients is zero. Show that J is an ideal in two ways: first, by using the 
definition of an ideal; second, by exhibiting J as the kernel of a homomor- 
phism. Which method is easier? 

(5) Find generators for the ideal 7 in the previous question. 

(6) We define a I of Z[z] as follows: f is in J if and only if the constant term 
of I is a multiple of 8. Show that J is an ideal in two ways: first, by 
using the definition of an ideal; second, by exhibiting J as the kernel of a 
homomorphism. Which method is easier? 

(7) Find generators for the ideal J in the previous question. 

(8) Let X be any set and k any field. Show that the set S of all functions 
X — k can be made into a ring in an obvious way—use the addition and 
multiplication in k to define addition and multiplication in S. 

(9) Think of the elements of R[x, y], the ring of polynomials with real coeffi- 
cients, as functions R? — R. That is, f(x,y) evaluated at the point (a,b) 
in the real plane is f(a,b) € R. Let C be the curve y? = x(x? — 1) in R?. 
By restricting f € R[x, y] to C C R? we get a map ¢: R[x, y] — S where S 
is the ring of all functions C — R. Show that ¢ is a ring homomorphism, 
and determine its kernel. 

(10) Show that x8 + xz" + zë + 24+ 1 divides zt — 1 in F2f[z]. 
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9. ARITHMETIC IN Z 


After learning to count and add, children learn how to multiply and divide. 
Questions about division and factorization are of primary importance in all rings. 
We begin here with Z. 

Recall that an integer p is prime if the only numbers dividing it are +1 and +p. 

If an integer m divides an integer m we write n|m. 


Lemma 9.1. Letn be an integer > 2. Then n has a positive prime divisor. 


Proof. Let ® = {m > 1 | m|n}. Then ® c {2,...,n}, so has a smallest element, 
say p. Certainly p divides n. If p were not prime there would be a positive integer 
q dividing p and satisfying 1 < q < p; but q would then divide n, so belong to 9, 
and would be smaller than p, contradicting the choice of p. We conclude that p is 
prime. 


Theorem 9.2. There are infinitely many primes in Z. 


Proof. If there were only finitely many positive primes p1,... , p+, the number m := 
1+pı--- p+ would not be divisible by any p;, and this would contradict the previous 
lemma. 


Lemma 9.3. Let p € Z\{—1,0,1}. Then p is prime if and only if it has the 
following property: 
whenever pļ|ab, pla or p\b. 


Proof. Since an integer p is prime if and only if —p is prime we can assume that p 
is positive. 

(=) The gcd d = (p,a) is a positive integer dividing p so is either 1 or p. If 
d = p, then pla and we are done, so suppose that d = 1. By a result in Math 402, 
there are integers u,v such that 1 = pu + av. Hence b = pbu + abv; but p divides 
pbu and ab, hence abv, so p divides pbu + abv = b. 

(<) To see that p is prime, suppose that d divides p. Write p = dc. It suffices 
to show that either c or d is +1. Suppose to the contrary that neither is +1. Then 
the absolute values of c and d are both > 2, and hence the absolute values of c and 
d are both < p/2. 

However, since p divides the product dc, the hypothesis implies that p divides 
either c or d. But p can’t divide any positive integer < p/2, so we obtain a contra- 
diction. 

We conclude that either c or d is +1, and hence p is prime. 


Theorem 9.4 (The Fundamental Theorem of Arithmetic). Let n be an integer 
>2. Then n is a product of primes in a unique way: if 


n = Pip2..-Pr = 102.. -qs 


with all the pis and qjs positive primes, and pı < ... < pr and qı < ... < qs, then 
r= s and pi = qi for all i. 


Proof. By Lemma 9.1, there is a smallest positive prime dividing n, say pı. Write 
n = pını. Since py > 2, nı < $n. Applying Lemma 9.1 to nı now, we can write 
n = pıponz with po a prime and 2 < pı < p2 and ng < zen. Continueing in this 
way, we get 


Nn = Pipo . . -Piny and n< at 
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This process will stop once 2’ > n. Thus n is a product of primes. 

Now suppose that n = pyp2...Pr = 91G2--- qs aS in the statement of the theorem. 
Since pı is prime and divides qiq2...qs, it must equal some q;. But qı < qi = pı 
and pı is the smallest positive prime dividing n, so pı = qı. Hence 

n 
— = p2..-Pr= q2---s 
Pı 


and repeating the argument we get po = qo, ... et cetera. 


More Exercises. 

We will write [r] to denote the image of an element r € R in a quotient ring R/I. 
That is, [r] = r + J, the coset of I containing r. 

The 25 elements in the field K = F5[2]/(f), where f = x? +x +1, are 


{0,a' |1 < i < 24} = {aa + 6 | a, 6 € Fs}, 


where a = [3x + 1]. 
(1) Fill in the missing powers of a in the following table: 


a’ = 2a al? = 4a al? = 3a 
a? = 4a +3 a® = 3a + 1 alt =a+2 a9 = 2a +4 
g p= Ae ye 
at = 3a +2 a =a+4 al = 2a + 3 a? = 4a +1 
a = 4a +4 a! = 3a + 3 a =a+1 a” = 2a +2 
aê = 2 a? = 4 al8 = 3 at =1 
(2) Write [zx] in the form aa + 6, with a, 8 € Fs. 
(3) Write [x3 + 2x + 4] in the form aa + 8, with a, 8 € Fs. 
(4) Find at least 2 zeroes in K of g = t8 + tt +1 € Kft]. (Hint—factor y’ — 1 


and z1? — 1.) 
(5) Why is the ring F = Fig[x]/(x? + 5) a field? 
(6) Find the two square roots of 15 in the field F. 
(7) Find the zeroes in F of the polynomial f(y) = y? + 3y +8 € Fly] by using 
the quadratic formula 


—b + Vb? — 4ac 
2a ` 


Friendly advice: plug your answer into f(y) and check you get zero! 
Give an explicit isomorphism 6 : F — Fy9[t]/(t? + 1). 


y= 


Po 
Co 
ey 


10. DIVISIBILITY AND FACTORIZATION 


The notion of division makes sense in any ring, and much of the initial impetus 
for the development of abstract algebra arose from problems of division and factor- 
ization, especially in rings closely related to the integers such as Z[vd]. Division 
and factorization in polynomial rings is also of great importance. 


Definition 10.1. Let a and b be elements of a commutative ring R. We say that a 
divides b in R if b = ar for some r € R. 
We then call b a multiple of a and write aļb. © 
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Every element divides zero. 

Zero divides no elements other than itself. At the other end of the spectrum, 1 
divides every element. But 1 is not the only element with this property; a unit u 
divides every element because b = u(u™ 1b). Conversely, if u divides every element 
of R it is a unit. 


10.1. Greatest common divisors. Let R bea domain. A greatest common divisor 
of two elements a,b € R is an element d € R such that 

(1) d|a and d|b, and 

(2) if ela and e|b, then eld. 
We write d = gcd(a,b), or just d = (a,b). We say that greatest common divisors 
exist in R if every pair of elements in R has a greatest common divisor in R. 

The greatest common divisor is not unique. For example, in the ring of integers, 

both 2 and —2 are greatest common divisors of 6 and 10. Similarly, in Z[] both 2 
and 2i are greatest common divisors of 4 and 6. 


Lemma 10.2. Let R be a domain. If d and d' are greatest common divisors of a 
and b, then each is a unit multiple of the other. 


Proof. Because d and d’ divide each other, we have d’ = du and d = d'v, for some 
elements u and v. Hence duv = d; because R is a domain and d is non-zero, we 
may cancel to get wv = 1. 


To obtain uniqueness of a greatest common divisor we need some additional 
structure on R. For example, in Z if we also insist that the greatest common 
divisor be positive, then it becomes unique. 

Actually, we haven’t even shown that greatest common divisors exist in Z or 
Z|Vd]. There is something to do here. 

We can define the greatest common divisor of any collection of elements by saying 
that d is a greatest common divisor of a),... ,an if it divides each a;, and if e is 
any element of R dividing all of them, then e necessarily divides d. 

Exercise. Sometimes we write (a,b) for the greatest common divisor of two 
integers a and b. This notation is also used to denote the ideal generated by a and 
b. For some rings there is an equality of ideals, (a,b) = (d), when d is a greatest 
common divisor of a and b. 

Show that 1 is a greatest common divisor of 2 and x in Z[z]. What is the greatest 
common divisor of x and y in C[z, y]? 


10.2. Primes and Irreducibles. 


Definition 10.3. Let R be a commutative ring. A non-zero non-unit a € R is 
irreducible if in every factorization a = bc either b or c is a unit; 
prime if whenever albc either a|b or alc. © 


Lemma 10.4. In a commutative domain every prime is irreducible. 


Proof. Let p be prime. If p = bc then, perhaps after relabelling the factors, p|b, so 
b = pu and p = puc; we can cancel in a domain, so 1 = uc, whence c is a unit. 


The converse of this lemma is not always true: an irreducible need not be prime 
(see Example 10.8 below). 
In order to give such an example we introduce some more general considerations. 
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10.3. Quadratic extensions of the integers. Let d be a non-square integer. 
The ring 
Z[vd] := {a+ bVd | a,b € Z} 

is called a quadratic extension of the integers. 

Rings such of these have played, and continue to play, a central role in number 
theory. 

Factorization and divisibility questions in Z[vd] are tackled by making use of 
what one knows about factorization and divisibility in Z, and this is done by making 
use of the norm function: the norm of an element xz = a + bvd € Z[Vd] is 


N(x) := a° —b'd. 
If d is a negative integer the norm of an element x in Z[Vd] is equal to xz = |z|?, 
where 7 is its complex conjugate. 
The two fundamental properties of the norm are given in the next lemma, the 
proof of which is trivial (notice the proof of (1) uses the fact that d is not a square. 


Lemma 10.5. Let x,y € Z[Vd]. Then 
(1) N(x)=0 © «=0. 
(2) N(zy) = N(2)N(y). 


Because the norm is an integer, a factorization a = xy in Z[Vd] implies the the 
factorization N(a) = N(«#)N(y) in Z. This provides a tool for studying factorization 
questions in Z[Vd]. 


Lemma 10.6. Let d be a negative integer. 
(1) The element x = a+ bVd is a unit in Z[Vd] if and only if N(x) = 1. 
(2) The units in Zi] are {+1, +i}. 
(3) Ifd # —1, the units in Z[Vd] are {+1}. 
Proof. Since d < 0, N(x) > 0. Certainly, if x is a unit, then 1 = N(1) = N(aa~') 
N(x)N(a~'), so we conclude that N(x) = 1. Conversely, suppose that N(x) = 
Then x 4 0, and it has an inverse in C, namely 


ea 1 a—b/d a-—bVvd a. 


a 


1. 


~ atbVda—byd -Pd 
This belongs to Z[Vd] so x is a unit in Z[vd]. 

The only way a? — b?d can equal 1 is if a? = 1 and b = 0, leading to the units 
+1, or if a= 0, d = —1 and b? = 1, leading to the units +i in Z[i]. 


Example 10.7. Determining the units in Z[Vd] is more complicated if d is a 
positive integer. For example, 1 + J/2 and 1 — V2 are units in ZV2I, and 2 + v5 
are units in Z[V5]. O 
Example 10.8. An irreducible need not be prime. Let R = Z[v—5]. We claim 
that 2 is irreducible in R but not prime. 

It is easy to see that 2 is not prime because although it does not divide either 
1+ /—5 or 1 — y—5 in Z[/—5], it divides their product: 

(1+ V—5)(1— V25) = 6 = 2.3. 
To see that 2 is irreducible, suppose that 2 = be where b,c € R. Then 
4 = N(2) = N(b)N(c). 
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If b = x + yV—5, then N(b) = x? + 5y?, and the only way N(b) could divide 4 
is if y = 0 and x = +2; so the only factorizations of 2 in Z[,/—5] are 2 = 2.1 = 
(—2).(—1); since one of the factors is a unit, we see that 2 is irreducible. © 


Exercise. Show that 2 is not prime in Z[i]. Describe exactly which prime 
integers remain prime in Z[i]. 


10.4. Unique factorization. 


Definition 10.9. A commutative domain R is a unique factorization domain, or UFD, 
if every element of R can be written uniquely as a product of irreducible elements, 
and the irreducibles that occur in the factorization are unique up to order and 
multiplication by units. © 


To see what “uniqueness” means in this definition, consider the factorizations 
6 = 2.3 = (—3).(—2) = (—3).(—1).(2).(—1).(—1) 
in Z. The uniqueness means this: if we have two factorizations of an element 


as a product of irreducibles, and x is an irreducible appearing in one of those 
factorizations, then some unit multiple of x must appear in the other factorization. 


Lemma 10.10. Jn a unique factorization domain, primes and irreducibles are the 
same. 


Proof. We observed on page 19 that a prime is irreducible. 

Suppose that x is an irreducible and that x|bc. Then bc = xy for some y. We 
can write each of b, c, and y, as a product of irreducibles. Doing so gives two 
factorizations of be as a product of irreducibles. Buy the uniqueness of such a 
factorization, at least one of the irreducibles in the factorizations of b and c must 
be a unit mutiple of x. But that implies that x divides either b or c, thus showing 
that x is prime. 


Example 10.11. Z[,/—5] is not a unique factorization domain because, as we 
showed in Example 10.8, the irreducible element 2 is not prime. 


Indeed, 
6 = 2.3 = (1+ V—5)(1 — V5) 

gives two distinct factorizations of 6 as a product of irreducibles. We already showed 
that 2 is irreducible, and a similar argument shows that 3 is irreducible. To see 
that 1+ /—5 is irreducible, write 1 + /—5 = ab and suppose that a is not a unit. 
Then 6 = N(1 + /—5) = N(a)N(b). We already saw that there are no elements in 
Z|\/—5] having norm 2 or 3, so it must be that N(a) € {1,6}. But a is not a unit, 
so N(a) #1; it follows that N(a) = 6, and hence that N(b) = 1, so b is a unit. 

Thus 1 + /—5 is irreducible, and a similar argument shows that 1 — /—5 is 
irreducible too. Q 


Historical remark. The notion of an ideal entered mathematics as a result of 
the failure of unique factorization to hold in certain rings. Originally, what we now 
call an ideal was called an ”idealized number”. The idea was to work with ideals 
rather than numbers: i.e., one could ask whether the ideal (6) in Z[,/—5] can be 
written as a product of prime ideals in a unique way. Of course, one needs to define 
what one means by a prime ideal for this to make sense. But notice that an integer 
p is prime if and only if Z/(p) is a field; so we say that an ideal p in Z[/—5] is prime 
if Z[\/—5]/p is a field. 
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Notice that (2) is NOT a prime ideal in Z[,/—5]. To see this, observe that neither 
a = 1 + vV—5 nor b = 1 — /—5 is in (2), i.e., neither a nor b is divisible by 2 in 
Z{/—5|. Hence their images ā and b in Z[,/—5]/(2) are non-zero. However, ab = 6 
is divisible by 2, so ab € (2) and this translates into the fact that in Z[,/—5]/(2), 
ab = 0. Since Z[,/—5]/(2) is not a domain it is not a field. 

However, (2,1 + /—5) is a prime ideal in Z[,/—5], and so is (2,1 — /—5). Simi- 
larly, the ideals (3, 1+ /—5) and (3, 1— /—5) are prime, and we have the following 
factorization of (6) as a product of prime ideals: 


(6) = (2,1 + /—5)?.(3, 1 + /—5).(3, 1 — V25). 


In fact, every non-zero ideal in Z[,/—5] can be written as a product of prime 
ideals in a unique way. This result (which actually extends to many other rings) is 
a very good replacement for the Fundamental Theorem of Arithmetic. 

This is typical of how mathematics develops. One has a result (here the Fun- 
damental Theorem of Arithmetic) that is enormously useful and one would like it 
to hold in new situations. Unfortunately it does not, so one modifies the original 
idea in some clever way (here by introducing ideals rather than numbers, and prime 
ideals rather than prime elements) so that a modified version of the original result 
is now true. 

Mathematicians are tricky—we want something to be true so we introduce new 
concepts and ideas so that it is true (or at least an appropriate modified version is 
true). 


A couple of people have asked about the official definition of a prime ideal. It is 
this: an ideal p in a commutative ring R is prime if R/p is a domain. Thus, in Z, 
0 is a prime ideal. In rings like Z[Vd] it turns out that an ideal p is prime if it is 
either zero or Z[Vd]/p is a field. 


Example 10.12. The ring R = k[t, t!/?,t'/4,---] is a domain in which prime and 
irreducible elements are the same but it is not a UFD. It fails to be a UFD because 
some elements, t for example, cannot be written as a product of irreducibles. To see 
that every irreducible is prime, suppose that x is irreducible and that x|yz. There 
is a suitably large n such that x,y, and z, all belong to = k[t,t!/?,--- ,t!/2"]; this 
subring is equal to k{t!/2"] which is a polynomial ring in one variable (so a UFD); 
since z is still irreducible as an element of k[t!/2"] it is prime in k[t!/?"], so must 
divide either y or z; hence z is prime in R. © 


Some Exercises. 

In all these exercises, the elements a,b,c... belong to a commutative ring R. 
Similarly, J and J denote ideals in a commutative ring R. All rings are assumed to 
have an identity 1 Æ 0. 

(1) Show that IJ is an ideal if J and J are ideals. 

) If J is an ideal in R containing an ideal J, show that J/I is an ideal of R/T. 
Show every ideal of R/I is obtained in this way. 
(3) If J is an ideal in R containing an ideal 7, show that 


(R/T) /(J/T) © RJJ. 


Hint: use Proposition 7.12. 
Show that there is a 1-1 correspondence between the ideals in R/I and the 
ideals in R that contain I. 


~ 
A 
s 


23 


(5) Show that Z[/—5]/(1 + /—5) & Ze. The way to do this is to construct 
a surjective ring homomorphism ¢ : Z[\/—5] — Ze such that ker o = (1 + 
/—5) and appeal to Proposition 7.12. In particular, you will need ¢(1 + 
/—5) = 0. However, a ring homomorphism must, by definition, send the 
identity to the identity, so you now know what $(./—5) must equal. Now, 
you can figure out how to define ¢ on all elements of Z[,/—5] because 


ple + yV—5) = (x) + o(y) (V5). 


So, with all these hints go ahead and prove that Z[/—5]/(1 + /—5) = 
Ze. Make sure that when you define ¢ you show that it really is a ring 
homomorphism—the tricky point will be to show that ¢(ab) = $(a)d(b). 
(6) Use the previous problem and the fact that 
R ue RJ 
I+J (I+ J)/I 
to show that Z[/—5]/(2,1+ /—5) = Fo, the field with two elements. 
(7) Show that Z[,/—5]/(3,1+ /—5) & Fs, the field with three elements. 
(8) Decide whether the following integers remain prime in Z[?]: 2,3,5,7, 11, 13. 
Do you detect a pattern? Can you conjecture a general result? 
(9) Is there a ring homomorphism ¢ : Z[,/—5] — F7 such that $(/—5) = 2? 
Explain. 
(10) Is there a ring homomorphism ¢ : Z[\/—5] — Fy, such that 4(./—5) = 2? 
Explain. 


11. ARITHMETIC IN kfz] 


In k[a], if we insist that the greatest common divisor of two polynomials be a 
monic polynomial, meaning that its leading coefficient is one, it becomes unique. 


Proposition 11.1. If f and g are non-zero elements of k|x] such that f is non-zero, 
then there are unique polynomials q and r such that 


g=faqtr and degr < deg f. 


Proof. Existence. We argue by induction on deg g. If g = 0, we can take q = r = 0. 
If degg < deg f, we can take q = 0 and r = g. If m = degg > deg f = n, we can 
write 

g = ax™ +---lower degree terms 

f = Bx” +---lower degree terms. 


Since 
deg(g —- ap™+x™" f) < degg, 
we may apply the induction hypothesis to g — a37!a™~"f. 
Uniqueness. If g= fq+r = fq +7’, then f(q—q') =r’ —r. But deg(r’ —r) < 
deg f, so this implies that r’— r = 0. Hence q' = q also. 


Proposition 11.2. Every pair of non-zero elements in k|x] has a greatest common 
divisor. 


Proof. To prove this, we need to introduce the Euclidean algorithm. The Euclidean 
algorithm is a constructive method that produces the greatest common divisor of 
two polynomials, as we now show. 
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The Euclidean algorithm. Let f and g be elements of k[z] with f non-zero. 
By repeatedly using Proposition 11.1 we may write 


g=fat+n with deg r, < deg f, 
f=riqg +r with degra < deg r1, 


Tı = rəq3 + r3 with deg r3 < degra, 


Since the degrees of the remainders r; are strictly decreasing, this process must 
stop. Stopping means that the remainder must eventually be zero. If rz,2 = 0, and 
we set r—ı = g and ro = f, then the general equation becomes 


Ti = Ti+1li+2 + Ti+2 with deg Ti+2 < deg Ti+1; (11-1) 
and the last equation becomes 


Te = Tt4+19t4+2- 


Claim: ri41 = gcd(f,g). Proof: Since r41 divides r+, it follows from (11-1) that 
rz41 also divides r;_1. By descending induction, (11-1) implies that r¿+ı divides all 
ri, i > —1. In particular, r++ı divides f and g. On the other hand, if e divides both 
f and g, then it divides rı. If e divides r; and r;+1, then it follows from (11-1) that 
it also divides ri+2. By induction, e divides r4+1. Hence r:+1 is a greatest common 
divisor of f and g. Q 

This proceedure for finding the greatest common divisor of f and g is called the 
Euclidean algorithm. It completes the proof of Proposition 11.2. 


If K is a field containing k, then K[2] contains k[z]. Hence, if f and g belong 
to k[x], we can ask for their greatest common divisor in k[a], and for their greatest 
common divisor in K[x]. These are the same. This is because the uniqueness of q 
and r in Proposition 11.1 ensures that carrying out the Euclidean algorithm in k[a] 
for a pair f, g € k[a] produces exactly the same result as carrying out the Euclidean 
algorithm in K [2] for that pair. 


Proposition 11.3. Let d be a greatest common divisor in k|x] of non-zero elements 
f and g. Thend=af +g for some a and b. 


Proof. Since a greatest common divisor is unique up to a scalar multiple, we can as- 
sume that d = r41, the last remainder produced by Euclidean algorithm. Working 
backwards, we have 


Teq1 = Tt-1 — Tt4t4+1 = Tt 1— (r: Qe 1%) G41 =t, 


and so on. Eventually we obtain an expression in which every term is a multiple of 
either ro = f or rı = g. Hence the result. 


Let f € k[a]. We write (f) for the set of all multiples of f. That is, 
(F) = {fg | g € ka}. 


It is clear that (f) contains zero. The sum and difference of two multiples of f are 
multiples of f. Any multiple of a multiple of f is a multiple of f. Hence (f) is an 
ideal of k|x]. We call it the principal ideal generated by f. 


Theorem 11.4. Every ideal in k|x] is principal. 
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Proof. The zero ideal consists of all multiples of zero, so is principal. If J is a non- 
zero ideal, choose a non-zero element f in it of minimal degree. Clearly (f) c I. 
If g is an element of J, we may write g = fq +r with degr < deg f. However, r 
equals g — fq, so belongs to I; because the degree of f was minimal, we conclude 
that r = 0. Hence g € (f). Thus I = (f). 


Notice that (f) is generated by Af if À is a non-zero element of k. Conversely, 
if (f) = (g), then g and f must be multiples of each other, so g = Af for some 
non-zero A in k. Hence, if J is a non-zero ideal in k[a], there is a unique monic 
polynomial f such that I = (f). 


Proposition 11.5. The following conditions on a non-zero, non-unit f € k|x] are 
equivalent: 

(1) f is irreducible; 

(2) (f) is a maximal ideal; 


(3) klæ]/(f) is a field. 


Proof. By Lemma 7.17, conditions (2) and (3) are equivalent, so we only need prove 
the equivalence of (1) and (2). 

(1) = (2) Suppose that f is irreducible, and that (f) C (a) # k[z]. Then f = ab 
for some b, and a is not a unit because 1 ¢ (a); because f is irreducible, b must be 
a unit. Thus a = fb~!, so every multiple of a is a multiple of f, whence (a) C (f), 
and we deduce that (f) = (a) showing that (f) is maximal. 

(2) = (1) Suppose that f = ab; we must show that either a or b is a unit. 
Suppose that a is not a unit. Then (a) # k|x]. But every multiple of f is a multiple 
of a, so (f) C (a), and the hypothesis that (f) is maximal implies that (f) = (a). 
In particular, a = fu for some u € k[z], whence f = ab = fub and 1 = ub because 
we can cancel in a domain. Thus b is a unit, showing that f is irreducible. 


The simplest illustration of Propositon 11.5 is provided by k[x]/(x — A) where 
A € k. Every polynomial of degree one is irreducible, so k[æ]/(x — A) is a field. 
Which field? It is k because (x — A) is the kernel of the evaluation homomorphism 


e: k|e] > k, e(f) = f(A). 


Example 11.6. Let f € R[x] be a monic polynomial of degree two. If f has a real 
zero, it is not irreducible in R[x]. Suppose f has no real zero. Then it is irreducible 
in R[z], so R[x]/(f) is a field. Which field? It is C because, if a € C is a zero of 
f, (f) is the kernel of the evaluation homomorphism e : R[z] — C, e(f) = f(a). 
(Why is £ surjective?) 

It is perhaps good for your health to see explicitly which element(s) of R[z]/(f) 
square to —1. The way to do this is to complete the square: if f = x? + 2br + c, 
then in R[a]/(f) we have 

(x +b)? =b — c; 
because f has no real zero, b? — c < 0, whence Vc — b? € R; so the square of the 
image in R[z]/(f) of Vc — b? (x + b) is —1. © 


Proposition 11.5 provides a huge source of fields. For example, if d € Q is not a 
square, then Q[z]/(x? — d) is isomorphic to Q(vd). 


Algebraic and transcendental elements. Let K be a field and k a subfield 
of K. An element a € K is said to be algebraic over k if it is a zero of a non-zero 
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polynomial with coefficients in k. That is, if 
Ana” + Anara a + a + àia + Ào = 0 


for some Ag,... , An E€ k, not all zero. Equivalently, a is algebraic over k if and only 
if the homomorphism e : k[z] — K given by e(f) = f(a) is not injective. 

If a is not algebraic over k we say it is transcendental over k. 

We say that k is algebraically closed if the only elements algebraic over k (whatever 
K may be) are the elements of k itself. 


Proposition 11.7. Let k be a field. The following are equivalent: 
(1) k is algebraically closed; 
(2) the only irreducible polynomials in k[x] are the degree one polynomials; 
(3) every polynomial in k|x] of positive degree has a zero in k. 


Exercise. Factor x’ — 1 as a product of irreducible polynomials in F2[z], F3[2], 
and F [2]. 


12. ZEROES OF POLYNOMIALS 


One of the great motivating problems for the development of algebra was the 
question of finding the zeroes, or roots, of a polynomial in one variable. 

The question of whether an element a € k is a zero of a polynomial f € k[a] 
can be expressed formally as follows: is f in the kernel of the ring homomorphism 
Ea : kla] > k defined by 

Eal f) = fla)? 
You should check that £a is a ring homomorphism; indeed, the ring structure on 
k[a] is defined just so this is a homomorphism. The kernel of £a is an ideal, and 
obviously contains x — a and therefore the ideal (x — a). However, (x — qa) is a 
maximal ideal. We therefore have the following result. 


Lemma 12.1. If f € k[a], then x — a divides f if and only if f(a) = 0. 


Definition 12.2. Let a € k and 0 4 f € k[az]. We say that a is a zero of f of 
multiplicity n if (x — a)” divides f but (x — a)"*+ does not. Q 
Proposition 12.3. Let f be a monic polynomial in k|x]. If ai,...,Q@, are the 
distinct zeroes of f, and a; is a zero of multiplicity ni, then 

f = (x — 1 )™ -+ (x — ar)” g 
where g is a polynomial having no zeroes in k. 


Proof. We argue by induction on the number of zeroes and multiplicity, cancelling 
a factor of the form x — a at each step. 


The next result and its corollary are among my favorite results in mathematics— 
the proof is very devious. It shows that if f is a non-constant polynomial with 
coefficients in a field k, then there is a larger field K in which f has a zero. Of 
course, the first example that comes to mind is the polynomial £z? + 1 in which 
case C, the field of complex numbers, contains a zero of the polynomial. However, 
notice that the proof is essentially a tautology. 

A field K is called an extension of a field k if k is a subfield of K. 


Theorem 12.4. Let f € k[a] be an irreducible polynomial. Then there is a field 
K Dk and an element a € K such that f(a) =0. 
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Proof. Since f is irreducible K := k[a]/(f) is a field. Let a = z = x + (f) be the 
image of x in K. If f = $` a;zt, then, computing in K, we have 


Fla) = Yo ma = Y li + (f))(e + (A) = (Ql aia* + (f)) =F + (f)) = 0 


as claimed. 
Alternatively, let m : k[z] — K denote the natural map, and write a = n(x). 
Then 


Hence a is a zero of f. 


Corollary 12.5. Let f € k[a] be a monic polynomial. Then there is a field K Dk 
and elements a; E€ K such that f = (x — ai) (x — az) +- (£ — ay). 


Proof. We argue by induction on the degree of f. If deg f = 1 there is nothing to 
prove. Write f = gh where g is irreducible. Then L := k[x]/(g) is a field and there 
is an a € L such that g = (x — a)g' for some g’ € L|x]. Now consider f as an 
element in L[z]. As such it factors, say f = (x — a) f’. By induction there is a field 
K D L such that f’ is a product of linear factors in K[z]. Hence f is a product of 
linear terms in K [a]. 


13. PRINCIPAL IDEAL DOMAINS 


Recall that every ideal in Z is of the form (d) for some d. Similarly, every ideal 
in k[a] is of the form (f) (Theorem 11.4). 
An ideal of the form (r) in a ring R is said to be principal. 


Definition 13.1. A principal ideal domain is a domain in which every ideal is principal, 
i.e., every ideal consists of multiples of a single element. © 


Using the Euclidean algorithm is the standard method to show that a ring is a 
principal ideal domain. The argument in Theorem 11.4 is typical. 


Proposition 13.2. Let R be a principal ideal domain. Then 


(1) greatest common divisors exist in R; 
(2) if d= gcd(a,b), then d = ax + by for some x,y € R; 
(3) every irreducible in R is prime. 


Proof. (1) and (2). The ideal aR + bR is principal, so is equal to dR for some 
d € R. Clearly, d = ax + by for some z,y € R, so it remains to show that d is a 
greatest common divisor of a and b. First, since a and b belong to dR, they are 
both divisible by d. Second, if e divides both a and b, then aR + bR is contained 
in eR, so d is a multiple of e. Hence d is a greatest common divisor of a and b. 

(3) Let a be irreducible, and suppose that a|bc. To show that a is prime, we 
must show it divides either b or c. Let d = ax + by = gcd(a,b). Since d divides 
a, either d is a unit or a = du with u a unit. But d|b, so the second alternative 
implies that alb. Now suppose that d is a unit; since a divides bc it also divides 
acxd—' + beyd~! = c(ax + by)d~! = c. Hence a is prime. 


Theorem 13.3. Every principal ideal domain is a unique factorization domain. 
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Proof. Let R be a PID and a a non-zero non-unit in R. We must show that a is a 
product of irreducibles in a unique way. 

Uniqueness. Suppose that a = a, --+-dm = bı :-:bn and that each a; and b; is 
irreducible. Without loss of generality we can assume that m < n. If m = 1, 
then we would be done. By Proposition 13.2, a; divides some bj; relabel the bjs 
so that aı|bı. Since a; and bı are irreducible, bı = au for some unit u. Thus 
a2::-am = (ubg)--- bp. If m = 1, we would have 1 = (ub2)---b, so n would have 
to be one also, and we would be finished. However, if m > 1 and by an induction 
argument we can reduce to the case m = 1. 

Existence. Suppose to the contrary that a is not a product of irreducibles. Then 
a is not irreducible, so a = a,b; with a, and bı non-units. Since a is not a product 
of irreducibles, at least one of a; and b; is not a product of irreducibles. Relabelling 
if necessary, we can assume that a; is not a product of irreducibles. Thus a, is not 
irreducible, and we may write a, = agb2 with a2 and bz non-units. 

Continuing in this way, we obtain a sequence a ,qa2,... of irreducible elements, 
and factorizations a; = a;41b;+1 into a product of non-units. This yields a chain 


Ra C Ra, C Rag C::: 


of ideals. The union of an ascending chain of ideals is an ideal of R, and it is a 
principal ideal, say Rz, by hypothesis. Now z must belong to some Ra;, but then 
Rz C Ra; C Raj, C Rz, so these ideals are equal. In particular, a;,; E€ Ra;, so 
Qi+1 = AU. It follows that Qi = Qi+1bi+1 = a;ubj+1, whence bi44 is a unit. This is 
a contradiction. 

We conclude that a must be a product of irreducibles. 


Proposition 13.4. Let f be an element in a principal ideal domain R. The fol- 
lowing are equivalent: 

(1) f is irreducible; 
(2) (f) is a maximal ideal; 
(3) R/(f) is a field; 
4) 


(4) f is a prime. 


Proof. Lemma 7.17 shows that conditions (2) and (3) are equivalent. Theorem 13.3 
and Lemma 10.10 shows that conditions (2) and (4) are equivalent. 

(1) = (2). Suppose J is an ideal of R that contains (f). By hypothesis, J is 
principal, say J = (g). Thus f = gh for some h € R. Since f is irreducible either g 
is a unit, in which case J = R, or h is a unit, in which case g = fh! and (g) = (f). 

(2) => (1). Suppose that f = gh. Then (f) C (g) so either (g) = R, in which 
case g is a unit, or (g) = (f), in which case g = fv for some v € R and hv = 1 so 
h is a unit. Thus f is irreducible. 


14. VECTOR SPACES 


Definition 14.1. Fix a field k. An abelian group (V,+) is called a k-vector space if 
there is an action of k on V, 


kxV => V, (a,v) av, or av, 
such that for all u,v € V and a, E€ k, 


(1) a(u + v) = au + av, 
(2) (a + 8v = av + fo, 
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Vector spaces are all around us and you have encountered many. 
Let n > 1. The n-dimensional vector space k” = k x --- x k is the Cartesian 
product of n copies of k with action given by 


@.(Aq,... An) t= (Aq, ... An). 


We also define k? = {0} to be the k-vector space consisting of one element. We call 
it the zero vector space. 
The polynomial ring k[z] is a k-vector space with the action defined by 


a. (Ao + Air ++ + Anr”) = ao + ALE +++ + ane”. 


The principle that ensures k[] is a k-vector space applies to other situations: if 
R is any ring that contains k as a subring in such a way that ar = ra for alla € k 
and all r € R, then R becomes a k-vector space via 


Q.T = ar. 


The vector space axioms follow from the fact that R is a ring. 
For example, the quotient rings k[x]/I are k-vector spaces. 


Definition 14.2. A subspace U of a k-vector space V is a subgroup of (V,+) such 
that Au € U whenever A € k andu € U. © 


If R is a ring containing a field k, then every ideal I of R is a subspace of R. 
Similarly, if I C J are ideals of R, then J/I is a subspace of R/I (because it is an 
ideal). 


Definition 14.3. Let U and V be k-vector spaces. A k-linear map f : U > V isa 
group homomorphism such that f(Au) = Af(u) for all A € k and all u € U. If f is, 
in addition, bijective we call it an isomorphism of vector spaces and write U S V. 
The inverse of an isomorphism is an isomorphism. © 


Theorem 14.4. Let f: U — V be a linear map between two k-vector spaces. Then 
ker f := {u € U | f(u) = 0} is a subspace of U, imf is a subspace of V, and 
U/ker f = im f. 


If two rings R and S contain copies of the field k and f : R — S is a ring 
homomorphism such that f(A) = A for all A € k, then f is a linear map. 


Example 14.5. Let V = k”, n > 1, and define t: V — k to be the linear map 
n 
t(ay,... , Qn) = a1 +--+ + an = Soa. 
i=1 
It is an easy matter to check that ¢ is a linear map—you should do it. 
One can jazz this up. If w1,... ,Wn are any elements of k, the map w: V > k 


defined by 
w(ai,... Qn) i= X wia; 
i=1 
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is linear (again, check). We call w(a1,...,@n) a weighted sum. The map t in the 
previous paragraph is a special case of w. 

The map w can be described in terms of matrix multiplication and this is useful 
because there are many excellent computer programs for efficient multiplication of 
matrices. In this case we have 

Wy ay 


W( Ci 24 Oe) COs Om) | 2 | Hwa. 5) 


Wn An 


Example 14.6. The map w : Fi? — Fi; defined by 

10 

w(ai, eae , 210) = X ia: 

i=1 
is a linear map. You should check this. The kernel of w consists of what are called 
ISBNs, International Standard Book Numbers. If you pick a book off your book 
shelf you will find, perhaps on the back, perhaps on one of the early pages, a 10-digit 
number, for example, 3-540-60168-6, that uniquely identifies the book. The first 
digit identifies the language, the next three the publisher, the next five the book 
on that publishers list, and the last is a check digit. What we mean by that is that 
the first 9 digits, a1,... ,@ are determined by the book, but a 9 is chosen exactly 
so that (a1,... ,@10) belongs to ker(w). This can be done because once ay,... , Qg9 
are known and viewed as elements of F11, one knows a, +2a2+---+9ag € Fy, and 
if one now sets Q19 := a, +2a2+---+9ag, then a, +2a2+---+9a9+10a19 = 0 so 
(a1,... ,@10) belongs to ker(w). When you enter an ISBN into a computer system 
it checks that it belongs to ker(w); if you mistype one of the digits the computer 
will know that you made an error because what you have entered will not belong 


to ker(w). 
It is possible that aig = 10 but the letter X is used to denote that element of 
F 41; for example, 0-521-22909-X is a valid ISBN number. © 
Example 14.7. Let 
Q11 eee Aim 
A = 
Anl «+. Qnm 


be an n x m matrix with entries in the field k. View the elements of k™ as column 
vectors 
bı 


ie 
Bm 
Define a linear map f : k™ — k” by 
f(u) = Au 
for each u € k™ where A is the matrix above. Explicitly, 
Qil -< Qim fen 
Fbi,- , Bm) = 


Anı e. Anm Bm 
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The fact that f is a linear map follows easily from the basic properties of matrix 
multiplication and addition. Indeed, matrices were introduced, and their product 
and sum were defined, just so that every linear map f : k™ — k” is of the form 
ut Au for a suitable n x m matrix A. 

We can also describe each linear map f : k™ — k” as right multiplication by an 
m x n matrix as oppposed to left multiplication by an n x m matrix. For example, 
the map in the previous paragraph is also given by 

Q11 wee Anil 


F(Bi,--- Bm) = (61 S Bm) 


Qim «+. Qnm 


15. BASES AND DIMENSION 


Perhaps the most important notion needed for the analysis of a vector space is 
that of a basis, which then leads to the notion of dimension. 


Definition 15.1. A basis for a k-vector space V is a set B = {v; | i € I} C V such 
that every v € V can be expressed in a unique way as 


v= y QiUi 


for some a; € k. We do allow the possibility that the index set J is infinite. © 


Of course, for a given v at most a finite number of the coefficients a; in the 
expression for v are non-zero (there is no way of making sense of an infinite sum in 
V). 

The uniqueness part of the definition is vital. The span of a subset B of V is the 
set of v € V that can be expressed as a finite sum 


v= 5 QiVi (15-1) 
for some v;s in B and a;s in k. It is an easy exercise to show that the span of B is 
a subspace of V. We call an expression of the form (15-1) a linear combination of 
the vis. B is a basis for the subspace it spans if and only if every element in that 
span can be expressed as a linear combination of elements in B in a unique way; 
this is equivalent to the condition that if $` ajv; = 0 with each v; € B, then all a; 
are zero. 


Definition 15.2. The dimension of a vector space V is the cardinality of (number of 
elements in) a basis for V. Q 


The following result, which we will not prove, ensures that this makes sense. 
Proposition 15.3. Any two bases for a vector space have the same cardinality. 


We usually write dim V for the dimension of V, or dim, V if we want to emphasize 
the field k. 
The dimension of k” is n. The vectors 


eı = (1,0,0,... ,0), e2 = (0,1,0,... ,0), ... ,en = (0,0,... ,0,1,0), 
form a basis for k” because 
(Ai. An) = hye + Aven + °°: + Anen, 
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and there is obviously no other way of writing (Ai,... , An) as a linear combination 
of €1,... ,€n with coefficients in k. 


Theorem 15.4. Two k-vector spaces are isomorphic if and only if they have the 
same dimension. 


Proof. If U and V have bases 6 and C respectively having the same cardinality 
there is a bijection t : B — C. We then define f : U — V by 


f bs ast) := X Ont (ug) 


for any a,;s in k and us in B. It is easy to check that f is a linera map, and that 
it has an inverse given by 


f° OG) =X Bt o) 


where the vjs belong to C and the 8js to k. 
Conversely, if f : U — V is an isomorphism and B is a basis for U it is easy to 
check that {f (u) | u € B} is a basis for V. 


Corollary 15.5. Up to isomorphism, the only vector spaces of finite dimension are 
the k”, n > 0. 


Proposition 15.6. If f is a polynomial of degree n > 0, then dim, k[x]/(f) = n, 
and the images of 1,x,...,2"~+ are a basis for k{x]/(f). 


Proof. The natural homomorphism m : k[z] — k[z]/(f) sends k, the subring of 
constant polynomials, to an isomorphic copy of itself in k[z]/(f), so we think of 
k as a subring of k[a]/(f). Multiplication in k[x]/(f) therefore gives k[x]/(f) the 
structure of a k-vector space. Since the powers of x are a basis for k [a], their images 
span kļz]/(f). 

If g is any element of k|x], then g = af +r for some a € k{x] and some r of 
degree < n. Since 7(g) = 7(r) and since r is a linear combination of 1,z,... ,2"~?, 
{r(a*) |0 < i < n— 1} spans k|z]/(f). These elements are linearly independent 
too because the only linear combination of 1,z,...,2”~' that belongs to (f) is 


0.1 +0. +- + 0E. 


Theorem 15.7. If W is a subspace of a vector space U, then 
dim £ = dim U — dim W. 


Amongst other things this says that the dimension of a subspace or quotient of 
U is no larger than that of U itself. 

One trivial consequence of this result (that we use often) is that when W is a 
subspace of a finite-dimensional vector space U, W = U if and only if dim W = 
dim U. 


Consider the following special, but important, case of the theorem. Let f and g 
be non-zero polynomials in k[æx] such that g divides f. Then (f) C (g) and we have 
the ring isomorphism 

klal/(f) n kie] 


D/A (9) 
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This is also an isomorphism of k-vector spaces so 


deglo) = dim E = aim El — dim D 


(9) (f) (f) 


from which we see that 


~ 9 
dim D = deg(f) — deg(s). 

(f) 
If we set n = deg(f) and d = deg(g), then (g)/(f) is a d-dimensional subspace of 
the n-dimensional vector space k[a]/(f). This provides an important example of 
an (n, d)-linear code (see below). 


Theorem 15.8. Let f: U — V be a linear map between two vector spaces. Then 
(1) ker f is a subspace of U and im f is a subspace of V, 
(2) im f S U/ker f, 
(3) dim(ker f) + dim(im f) = dimU. 


Some Homework Problems 

In all the exercises below we write F for the field with two elements. 

(1) The first digit of an ISBN identifies the language in which it is published: 
for example, 0 for English, 2 for French, 3 for German. The next three 
digits identify the publisher. Suppose that the ISBN of a book published 
in German is 3-540. The English translation of it has an ISBN beginning 
O-abc, and the rest of the ISBN is the same as for the German edition. Find 
abc in order for this to be true. Is there a unique such abc? Explain. 

(2) Ifin entering an ISBN into the computer one transposes two adjacent digits 
can the computer detect the error? 

(3) UPS identifies packages by assigning a 10 digit number consisting of nine 
digits plus a check digit: the check digit is the remainder modulo 7 of the 
9-digit number. What percentage of single digit errors will this method 
recognize? 

) Show that a finite domain is a field. 

(5) Let K be a finite field. Why is the image of the natural ring homomorphism 
Z — K isomorphic to Zp for some prime p? We call this p the characteristic 
of K. 

(6) If k is a subfield of a field K, then K can be viewed as a k-vector space in a 
natural way. Suppose that K is a finite field of characteristic p. Why must 
the number of elements in K be of the form p” for some integer n? 

(7) Let K C L be finite fields. By the previous exercise there is a prime p and 
integers m and n such that |K| = p™ and |L| = p”. What is the relation 
between m and n (Hint: count the number of elements in k” when k is a 
finite field). 

(8) Let B be a subset of a vector space V. Show that B is not a basis if it 
contains distinct elements u,v, w,y,z such that u +v = w+y+z. 

(9) Find a subset B of the following elements of Fê that provide a basis for F°: 


000000, 011000, 001111, 111010, 010111, 110101, 100010, 101101, 101010, 111111, 010101. 


Make sure you prove that the elements in B both span Fê and that they 
are linearly independent—you may prove the latter by showing that 0 can 
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be written in only one way as a linear combination of the elements in B, 
namely the combination with all coefficients equal to zero. 


16. FINITE FIELDS 


Let K be a finite field. The group (K, +) is finite, so n.1 = 1 +--- +1 = 0 for 
some positive integer n. Let p be the smallest such integer. Then the natural ring 
homomorphism Z — K, 1+ 1, has kernel (p). Hence K contains a copy of Fp. 

We call p the characteristic of K. 


Proposition 16.1. Let K be a finite field of characteristic p. Then |K| = p” for 
some n. 


Proof. We can view K as a vector space over its subfield Fp. Because K is finite 
it has finite dimension, say n. Hence K = Fý as an Fp-vector space. But |F| = 


n 


p. 


Lemma 16.2. Let p be a prime, n a positive integer, and set q = p”. Suppose a 
and b belong to a ring R in which p = 0. Then 


(a+b)? =a? +b. 
Proof. We argue by induction on n. For n = 1 we have 
p 
Pp z es 
+b = "peo 
e. 
If 1< i< p-—l1, the binomial coefficient (£) is divisible by p so is zero in R. Hence 
only the i = 0 and į = p terms from the binomial expansion survive in R and give 
the result (a + b)P = aP + bP. 
Now suppose n > 1 and write q = pr. Then 
(a+b)? = ((a + b)”)” = (a” + br)? = (a")? + (b")? = at + Bf, 


as required. 


Theorem 16.3. Let p be a prime and n a positive integer. Then 
(1) There is a field with p” elements. 
(2) If K is a field with p” elements then every element in K is a zero of the 
polynomial x?" — x. 


Proof. (2) Since KX := K — {0} is a group with p” — 1 elements, a?"~! = 1 for all 
a € K — {0}. Hence a?” =a for alla € K. 

(1) Write q = p”. Consider the polynomial x? — x € Fp[x]. By Corollary 12.5, 

there is a field L containing F, and elements aj,... ,aq E L such that 

xt — g = (x — ai )(x — az) <- (£ — ag). 
Let K be the subset of L consisting of all the a;s. By the previous lemma, a; + a; 
is again in K; so too is ajaj;. Obviously, 0 and 1 are in K, and so too is az! if 
a; #0. Hence K is a subfield of L. 

It remains to show that K has exactly q elements, i.e., that x? — zx is not divisible 
by (x — à)? for any A € L. If it were, then \ would be a zero of both z1 — x and 
its derivative, qx17t — 1. But that derivative is —1 because p|q and p = 0 in Fp 
and hence in L. Hence x? — g is not divisible by (2 — A)?, and we deduce that 
|K|=4. 
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Theorem 16.4. If K is a finite field, then K* is a cyclic group. 


Proof. Let e = |K*| = pi{*---p;* as a product of powers of distinct primes. Define 


e= = and di= 2an 
qi di 
Since the polyomial x° — 1 has at most €i Zeroes, there is an a; € K such that 
afi #1. Define b; := af. Then 1 = af = Bee but 
nj—-1 
1fa% = bi 

so b; has order exactly q;"". 

Claim: If G is an abelian group and x,y € G have relatively prime orders r and 
s, then xy has order rs. Proof: Let d denote the order of xy. Then 1 = (xy) = «4 
so r|ds and hence r|d. Similarly, s|d. Hence rs|d. But (xy)"® = 1, so d = rs. Q 

An induction argument based on the claim now shows that the order of b1b2 - - - by 


is e. Hence K* is cyclic. 


Proposition 16.5. Let R be a commutative domain containing a field k. If 
dim, R < œ, then R is a field. 


Proof. Fix 0 # a € R and define 6: R — R by d(x) = ax. Then ¢ is a k-linear 
map and is injective because R is a domain. Thus dim(€ ¢) = dim R by Theorem 
15.8. Hence im ¢ = R, so 1 = ġ(x) for some x € R. Since ax = 1, a is a unit. 


17. LINEAR CODES 


Do not confuse coding theory with cryptography. In cryptography the key point 
is secrecy; one wishes to send a message in such a way that an unauthorized reader 
can not understand it. In coding theory secrecy is not an issue. Instead, the goal 
is to send a message in such a way that if a modest number of errors occur in 
transmission the recipient will still be able to recover the original message. 

The process involved is the following. The sender begins with the original mes- 
sage, perhaps a photograph. According to some rules the message is translated into 
a string of zeroes and ones. Rather than thinking of this as a single long string of 
zeroes and ones we think of it as a long sequence of chunks, each chunk consisting of 
some specified number of bits. For example, if one has a photograph consisting of 
1024 x 1024 = 27° pixels and each pixel can have one of 128 = 27 color/brightness 
levels, then the message consists of 27° chunks where each chunk consists of 7 bits. 
We call these chunks message words. 

We do not send the message words. Instead we add some extra data to each 
message word in a clever way to create a code word and transmit the code word. 
Let’s begin with two simple examples. 


Example 17.1. The (3, 1)-repetition code. Suppose a message word is a single bit, 
0 or 1, and each message word is turned into a 3-bit code word by repeating it 3 
times. Thus 000 is sent rather than 0 and 111 is sent instead of 1. If a single bit 
of a code word is changed in transmission one can recover the sent code word by 
taking the most frequently occurring digit; e.g., if one receives 101 it makes some 
sense to guess that 111 was sent. This allows us to correct one error and is therefore 
called a l-error correcting code. However, if two errors are made in transmission 
we would not correctly decode the received word. © 
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Example 17.2. The (4, 1)-repetition code. This code corrects one error and detects 
two errors. As in the previous example, a message word is a single bit, 0 or 1. Now 
each message word is turned into a 4-bit code word by repeating it 4 times. Thus 
0000 is sent rather than 0 and 1111 is sent instead of 1. If a single bit of a code 
word is changed in transmission one can recover the sent code word by taking the 
most frequently occurring digit. The (4, 1)-repetition code is better than the (3, 1)- 
repition code in that if two errors are made in transmission we will recognize that 
because one cannot change a codeword to different codeword by making < 2 errors. 
But this code will not correct two errors because if we receive 0011 there is no basis 
for reasonably arguing that one of 0000 and 1111 was sent rather than the other. 
© 


In the photograph example, we might add another 8 bits to the 7 bits in each 
message word in a very particular way. This new chunk consisting of 15 bits is 
called a code word and it is the code word what we transmit. In coding theory such 
a 15-bit code word is viewed as an element of the 15-dimensional vector space F3°. 
We often write C for the set of code words. In this example, C is a subset of F}5 
and consists of 2° elements, one code word for each possible message word. 

The sequence of code words (i.e., the 27° 15-bit code words in the photograph 
example) is called the encoded message and the process of translating the original 
message to the encoded one is called encoding. 

The encoded message is now sent. There is a possibility, in some situations a 
virtual certainty, that errors will arise during transmission. A typical cause of such 
errors is electro-magnetic interference. In the photograph example, we send a code 
word v € C and the received word w is an element F} which generally will not be 
v, the word sent. If a received word is a valid code word (i.e., w € C) the receiver 
assumes that w was the code word sent. However, if the received word is not a 
valid code word (i.e., w ¢ C) the recipient tries to correct w by replacing it by the 
code word that is “nearest to it”. After the correction is made, one translates the 
code word back to the message word that corresponds to it. This process is called 
decoding. 

The encoding and decoding process are of no interest to us here. If you ab- 
solutely must think about them, suppose for simplicity that the original message 
is in English, and one simply replaces the letter ”a” by 000001, the letter ”b” by 
000010, etc. But, to repeat, we have no interest in this process, and will consider 
it no further. 


Example 17.3. The (3,2) parity check code. Suppose the original message is 
composed of the four message words 


00, 01, 10, 11. 


The message words are the four elements of the vector space F3. A message word 
is converted to a codeword by adding to the end of it 


e a0 if the message word has an even number of 1’s; 
e al if the message word has an odd number of 1’s. 
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The following table lists the message words and the corresponding code words: 


Message Word —> Code Word 


00 000 
01 001 
10 101 
11 110 


The code words form a subset C C F3. A v € F3 is a code word if and only if the 
sum of its digits is zero. The code words form a vector subspace of F3. They are 
the elements in the kernel of the linear map 


3 
F5 > Fo, (a1, @2,a3) = a1 + a2 + Q3 


(see Example 14.5). If we receive a word for which the sum of its digits is 1, we 
know an error must have occurred in the transmission. Thus, this code allows us 
to recognize if a single error occurred in the transmission of a word. But we are 
unable to make a good decision as to what the sent word was; for example, if we 
receive the word 001, the word sent might be 000, or 011, or 101, or something 
else—the three possibilities listed are those in which only a single bit is messed up 
during transmission. This is not an error correcting code. 

Do not confuse C and F2. The message words belong to F23 and the code words 
belong to C; adding the appropriate digit to the end of a message word to produce 
a code word is an isomorphism Fə — C. In fact, C is the image of the linear map 
f : F} — F3 given by 


fu)=uA 
where 
1 0 1 
an (ò 1 i) l 
Thus the encoding procedure consists of right multiplication by A. © 


Definition 17.4. Fix positive integers k < n. An (n,k) binary linear code is a k- 
dimensional subspace C of F}. We call C an (n,k)-code, or block code, or simply 
a code. Elements of C are called code words. © 


Only code words are transmitted but due to errors in transmission any element 
of F} might be received. 

Each code word is simply a string of n Os and 1s, and what is received is a string 
of zeros and ones. We make the assumption that all errors are equally likely: i.e., 
if we send a string of n zeroes and ones the probability that the i*® digit is changed 
is independent of i and independent of whether that digit is a zero or one. We also 
assume that multiple errors are independent. 


Example 17.5. The rectangular (8,4) code. The word “rectangular” refers to 
the encoding procedure, the manner in which message words are changed into 
codewords. Suppose the message word is m = abcd € F4, i.e., each of a, b, c, d is 0 
or 1. Arrange a, b, c, d into a 2 x 2 square that is the top left part of a 3 x 3 square 
and fill in the entries labelled w, x, y, z in the square 


a b w 
c d & 
yz * 
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so that the sum of the entries in each of the top two rows and in the left-most two 
columns is zero. Then encode by 


abcd + abcdxyzw. 
For example, to encode 1011 we use 
1 
1 
0 1 


and send 10111001. When recipient checks that a+b+w = c+d+z=a+c+y= 
b+d+ z = 0. If this is not the case an error is recognized, and one can correct the 
error (think about it!). This code also detects two errors (think about it!). You 
can also check that C, the set of code words, is a 4-dimensional subspace of F8. © 


1 
0 
* 


The next code is better than the previous one. Like the previous one, its message 
words consist of four bits, and it corrects one error and detects two errors (proofs 
later). However, the codewords for it consist of 7 bits rather than 8 in the previous 
example. Thus it is more efficient than the previous code. 


Example 17.6. The Hamming (7,4) code. Here the code words form a 4-dimensional 
subspace C C FS. A basis for C consists of the words 


1 0 0 0 0O 1 1 
0 1 0 0 1 0 1 
0 0 10 1 10 
0 0 0 1 1 1 1 
Thus a valid code word is the sum of some of these four vectors. © 


If F? = {message words} it is convenient to define the encoding algorithm to be 
a linear map f : F — F” given by right multiplication by a d x n matrix. of course 
we want f to be injective so different message words give different code words. In 
that case if C is the image of f, then f is an isomorphism F? — C so its inverse, the 
decoding algorithm, is also given by a linear map (equivalently, by multiplication 
by a matrix). 

For the Hamming code the matrix 


1 0 0 0 0 1 1 
A= 0 1 0 0 1 0 1 
~10 010110 
0 0 0 1 1 1 1 
provides a linear map f : F4 — F3, 
1 0 0 0 0 1 1 
0 1 0 0 1 0 1 
f(a, ,@4) = (On, , a4) 0010110 
0 0 0 1 1 1 1 
= (@1,... , Q4, Q2 + Q3 + Q4, Q1 + Q3 + a4, Q1 + @2 + @4). 


The map f is obviously injective and the decoding algorithm C = im(f) > F4 is 
simply (61,.-. , 87) => (b1; --- , Ba). 

Discussion. How should we choose C? For simplicity we will always work over 
the field Fə. We should think of the set of message words as fixed, that is we are 
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given some F° consisting of message words—a typical example might be F$, a vector 
space with 256 elements, which is large enough to provide a vector for each letter 
of the english alphabet, both upper case and lower case, a vector for each digit 0-9, 
and for various other everyday symbols. We are then free to choose n > d and a 
d-dimensional subspace C of F”. We do not want n too big because this increases 
transmission time (a single message word consists of d digits but we must send n 
digits). However, we want C to be sort of well spread out in F” so that if, for 
example, a single digit of a code word c is altered during transmission the received 
word w is not in C but c is the element of C that is closest to w. We obviously 
need a notion of distance for ” closest” to make sense. 


17.1. Hamming distance and weight. Let u,v € F”. The Hamming distance 
between u and v is 


d(u, v) := the number of positions where u and v differ. 
Explicitly, if u = (u1,... , Un) and v = (v1,... , Un), then 
d(u,v) = {i | wi # vi} |. 


The Hamming weight of v € F” is W(v) := the number of ones in v. 


The next result shows that the Hamming distance has various good properties 
that justify our using the word distance: part (3) is called the Triangle Inequality— 
in some sense it says that the shortest distance between two points is a “straight 
line.”; part (4) says that the Hamming distance respects the vector space structure 
on F”. 


Lemma 17.7. Let u,v € F” and let d(—,—) denote the Hamming distance. 
(1) d(v,u) = d(u,v) = W (u — v); 
(2) d(u,v) = 0 if and only if u = v; 
(3) d(u,v) < d(u,z) + d(z,v) for all z € F”; 
(4) d(u,v) =d(u+z,u+ z) for all z € F”. 


Proof. (1) This is clear because the itè entries, u; and v;, of u and v differ if and 
only if u; — v; = 1. 
(2) This follows at once from (1). 
(3) Fix z = (21,...,2n) € F”. If u; Æ vi, then either u; Æ zi or vi Æ zi. 
(4) We have d(u + z,v + z) = W ((u + 2) — (v + z)) = W (u — v) = d(u, v). 


If a codeword u is transmitted and v is received the number of errors in transmis- 
sion is the number of coordinates in which u and v differ; that is, d(u, v). 

The next result says that if the probability that an error occurs when transmit- 
ting a single digit (bit) is small, then it is more probable that few rather than many 
errors occur in transmission. We shall always make this assumption. 


Proposition 17.8. Let p denote the probability that an error occurs when trans- 
mitting a single digit (bit). Let P(t, n) denote the probability that t errors occur 
when transmitting n > 1 digits. If p < at then 

(1 — p)” = P(0,n) > P(1,n) > P(2,n) >--- > P(n—1,n) > P(n,n) = p”. 
Proof. There are (7) ways in which exactly t errors can occur when transmitting n 
digits, so 


P(t,n) = (ra - prt, 
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Therefore 
PEIN) -. n! t(n-t)! p n-t p 
P(t,n)  (t+1)!(n-t-1)! n! l-p t+1 1-p 


We want to show that this is < 1. The hypothesis is that (n+1)p < 1, so (n—t+t+ 
1)p < t+1 for allt = 0,1,... ,n, and this can be rewritten as (n—t)p < (+1)(1—p), 
from which the result follows. 


17.2. Nearest neighbor decoding. Proposition 17.8 shows that if p < ait it is 
more likely that there are fewer rather than more errors so the codeword nearest 
to a received word is most likely the codeword that was transmitted. We therefore 


adopt the following decoding policy: 
e a received word is decoded as the codeword nearest to it and 
e if there is more than one codeword nearest to a received word 
the decoder records an error. 
We call this (maximum-likelihood) nearest neighbor decoding. 
We say that a linear code 


e corrects t errors if every codeword that is transmitted with < ¢ errors is 
correctly decoded by nearest neighbor decoding. 

e detects 2t errors if a codeword cannot be changed to a different codeword 
by changing < 2t bits. 


17.3. Balls. Let v € F” and t a non-negative number (usually a positive integer). 
The ball of radius t with center v is 
Bi(v) := {z € F” | d(v, z) < t}. 
Proposition 17.9. Let u,v € F” and s and t non-negative integers. Then 
B.(u) N Biv) #ọ — duv)<s+t. 
Proof. (=) If z belongs to both balls, then d(u,v) < d(u, z) + d(z,v) < s +t. 
(<)' If d(u,v) < s, then the balls intersect because v is in both. If s < d(u, v) < 


s + t, then we can change s digits of u that differ from the corresponding digits in 
v to get an element z; clearly d(u, z) = s and d(z,v) < t, so z is in both balls. 


It is easier to detect errors than correct them—we saw this with the ISBNs where 
one could recognize a non-ISBN but not know which ISBN it was “trying to be”. 
A linear code C C F” detects t errors if and only if d(u,v) > t +1 for all u Æ v in 
C. 


Lemma 17.10. The following conditions on a linear code C are equivalent: 
(1) C corrects t errors; 
(2) for all u E€ C and all z € Bi(u), u is the unique element of C having 
distance < t from z; 
(3) Bilu) O Biv) = @ whenever u and v are distinct elements of C; 
(4) min{ W (u) |0 Aue C} > +1. 


Proof. Better you convince yourself than read what I write. 


IThe implication (=) is true for any pair of non-negative real numbers s and t. But the 
implication (<) fails if s and t are not integers: for example, if d(u,v) = 1, and s and t are 
positive numbers such that s + t = 1, then the intersection of the balls is empty. 


Al 


Theorem 17.11. The following conditions on a linear code C are equivalent: 


(1) C corrects t errors; 
(2) d(u,v) > 2t+1 for allu v in ©; 
(3) W(u) > 2t+1 for lO AUEC. 


Proof. This follows from the previous two results. 


17.4. (n,k,d) linear codes. An (n,k, d)-code is an (n, k)-linear code such that 
d=min{W(u)|04AueE Ch. 


In other words, distinct codewords are distance at least d apart and there are code 
words exactly distance d apart. 


Corollary 17.12. An (n,k,d) code corrects t errors and detects 2t errors if and 
only if d > 2t+1. 


Proof. This is a restatement of Theorem 17.11. 


Lemma 17.13. Let FX = C C F” be an (n,k)-linear code. 
(1) If C is is an (n,k,d)-code, then the balls Banı (u), u € C, are disjoint. 
(2) Ifc is the largest integer such that the balls B.(u), u € C, are disjoint then 
the minimum distance for the code is either 2c +1 or 2c+ 2. 


Proof. (1) If z were in two distinct such balls, say those centered at u,v € C, then 
d(u,v) < d(u, z) + d(z,v) = d — 1, so the minimum distance would be < d — 1. 

(2) By Proposition 17.9, the fact that B.(u)NB-(v) = ¢ implies that d(u, v) > 2c, 
and hence is > 2c+1. However, if the minimum distance were > 2c+3, Proposition 
17.9 would imply that the balls B.41(u), u € C, were disjoint, contradicting the 
choice of c. 


17.5. Perfect codes. If a € R we define [a] := the largest integer that is < a. 
An (n,k,d)-code F* © C c F” is perfect if F? is the disjoint union of the balls 
B,(u), u € C, where t = [4+]. 


Proposition 17.14. An (n,k,d) code C C F” is perfect if and only if 


on B= py (i) $ 6 career C) (17-1) 


Proof. Fix u € F”. The number of elements in F” that differ from u in exactly 


i < n positions is (%) so the number of elements in B, (u) is 


o a! 


(=) There are 2” elements in F” and 2* elements in C, so if F” is the disjoint 
union of the balls B,(u), u € C, then 


ape) o 


(<) Since W (u) > d > 2t+1 for 0 Æ u € C, the balls B,(u), u € C, are disjoint. 
Therefore the number of elements in their union is the right-hand side of (17-2) 
which equals 2” by hypothesis. Hence the union of the balls is equal to F}. 


d= 
where t = [=] ; 
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Warning: Proposition 17.14 does not say that a perfect (n, k, d) code exists if 
the equation (17-1) holds. However, if there is an (n, k, d)-code such that equation 
(17-1) holds then that code must be perfect. Let’s look for some perfect codes. 


17.6. Perfect 1-error correcting codes. This is the simplest case. By Corollary 
17.12, (n, k, 3)-code corrects 1 error and detects 2 errors. An (n, k, 3) code is perfect 


if and only if 
mk 1+ (") =n+1 


so our search for a perfect (n, k,3)-code must begin by taking n to be one less than 
a power of two. 

e A perfect code with n = 1 would be useless because then k would be zero so 
F* = {0} and the only message word is 0, and C is the zero subspace of F”. Such 
a code can only send one message, a string of zeroes, duh! So, the first interesting 
case of a l-error correcting code would be when n = 3, and k = 1. 

e There is a perfect (3,1,3)-code. Let C C F? be the subspace {000,111}. This 
is a perfect (3,1,3)-code. The isomorphism F = F* — C is given by 0 + 000 
and 1 +> 111. The message words are the elements of F, namely {0,1}, and the 
corresponding code words are 000 and 111. This code triples the length of a message; 
it detects two errors and corrects one error. 

e The next smallest n for which a perfect (n, k,3)-code might exist is n = 23-1 = 
7. In this case k must be 4. 


Proposition 17.15. The Hamming (7,4)-code is a perfect (7,4, 3)-code. 


Proof. We must check that all non-zero codewords have weight > 3 and at least 
one code word has weight exactly 3. A basis for C is given by 


a = 1000011, b = 0100101, c = 0010110, d = 0001111. 
The non-zero elements of C written in “alphabetical order”, namely 
a, ab, abc, abcd, abd, ac, acd, ad, b, bc, bcd, bd, c, cd, d, 
where acd denotes a + c + d, et cetera, are 
1000011, 1100110, 1110000, 1111111, 1101010, 1010101, 1011010, 1001100, 
0100101, 0110011, 0111100, 0101010, 0010110, 0011001, 0001111. 


Since each of these has weight > 3, we are done. 


The next smallest n for which a perfect (n, k,3) code might exist is n = 24 — 1 = 
15 and k = 11. Is there a perfect (15, 11,3) code? 


17.7. Perfect 3-error correcting codes. An (n,k,7)-code is 3-error correcting 
and will be perfect if and only if 


n n n 
Danak, 
(G6) 
A calculation with n = 23 gives 


1+ a + o + a = 2048 = 2! = 223-12 


so a (23,12, 7)-code will be perfect if it exists. How can we find one or decide one 
does not exist. This is a hard question. There is one, and it is due to Golay. It is 
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intimately related to a sporadic simple group, the Mathieu group Mə4. It is also 
related to the most efficient known packing of spheres in 24-dimensional space. See 
http://www.math.uic.edu/ fields/DecodingGolayHTML/introduction. html 

In 1949, Golay published a half-page article Notes on digital coding in the Pro- 
ceedings of the Institute of Electrical and Electronic Engineers that is now seen as 
one of the most important publications of the last century. Today, reliable data 
transmission uses methods developed from that original article. 


17.8. Further remarks. Other important codes are the Golay (11, 6,5)-code, and 
the Reed-Muller (32,6,16) code. The latter was used on the Mariner 6, 7, and 9 
voyages to transmit pictures back to earth. Each picture consisted of 700 x 832 
pixels, each of which had 64 = 2° brightness levels black-white. The cameras pro- 
duced pictures at the rate of 100,000 bits per second but data could be transmitted 
back to earth only at 16,200 bps, so pictures were temporarily stored on tape prior 
to transmission. For this reason the question of efficiency is of great importance in 
constructing codes. The Reed-Muller code corrects 7 errors; thus if 7/32, or a little 
over 22%, of a code word is corrupted in transmission one can still determine the 
original code word. 

The Voyager 1 and Voyager 2 spacecraft transmitted color pictures of Jupiter and 
Saturn in 1979 and 1980. Color image transmission required 3 times the amount 
of data, so the Golay (24,12,8) code was used (this is a jazzed up version of his 
(23,12,7) code). This Golay code is only 3-error correcting, but could be trans- 


mitted at a much higher data rate than the Reed-Muller code because ?3 is much 


12 
smaller than 32 Voyager 2 went on to Uranus and Neptune and the code was 
switched to a Reed-Solomon code. 

In 1960, Reed and Solomon introduced their codes in a five-page article ” Polyno- 
mial Codes over Certain Finite Fields,” in the Journal of the Society for Industrial 
and Applied Mathematics. The basic idea is this. Let n = 2” and take k < n. 
(A typical example is r = 8). Let K = F,, and choose a generator, say a, for the 
cyclic group K — {0}. Given a message word m = aga,a2---ap—1 E K* form the 
polynomial f = ao + a1% + aga? +++» + ap_ya*—! € K[2] and evaluate it at each 
element of K to obtain a code word 


F(m) := (f(0), f(a), f(a’),..., f(a") € K”. 


Since f is a polynomial of degree < k—1 we can determine it once we know its value 
at k different points. Since n > k, we can recover f, and hence its coefficients which 
are the message word, from the code word. The idea behind the error correction 
is this. If you plot some points on the graph of a polynomial, for arguments sake 
let’s say 20 points of a degree 4 polynomial, and a small number of those points are 
incorrect one can “see” that by looking at the graph and still figure out what the 
polynomial is that is being corrected. 

The 2004 missions to Mars and Saturn transmitted photographs consisting of 
1024 x 1024 and each pixel consisted of a 12 bits giving its color, brightness, intensity, 
etc. Thus each picture required 21° x 21° x 12 bits. That is a little more than 12 
million bits per photograph. 

Error-correcting codes are essential for computer disk drives, CD players, televi- 
sion transmissions, phone calls, and all kinds of data transmission over both short 
and long distances. Careful engineering can reduce the error rate to what may 
sound like a negligible level-the industry standard for hard disk drives is 1 in 10 
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billion—but given today’s volume of information processing that ” negligible” level 
is an invitation to disaster. Billions and billions of dollars of equipment and mil- 
lions of lives depend on error correcting codes. The world would look very different 
without these ideas, all of which grow out of elementary abstract algebra. The 
abstract algebra on which coding theory rests was developed over the last 150 years 
which was in turn developed to answer questions and problems that had arisen 
from within mathematics over the two or three centuries prior to that. In particu- 
lar, none of this mathematics was developed with the goal of being used in the real 
world. Keep this in mind the next time you hear about cutting the budget for the 
National Science Foundation because the research being done is “of no practical 
use.” 


18. BCH CODES 


These codes were invented by Bose, Chauduri, and Hocquenghem. The following 
definition is vague but will tell you where we are headed. 


Definition 18.1. A BCH code is a linear code of the form 


(9) F[z] 
C= GC =F” 
(£x? —1) ~ (a — 1) 
where g € F[z] is a (carefully chosen) polynomial dividing x” — 1. © 


18.1. Notation. We will write elements of F|x]/(x” — 1) as truncated polynomials 

of degree < n — 1. For example, in F[z]/(x7 — 1) we have 

(1+a?+a*)(1ta%tat+e%) = 1t+e% tattoo te? +o°+aStotettl+eta3 = e+e’. 
We will adopt 1, £, 2?,... ,2"~! asa basis for F[z]/(a”—1) and write (ao, a1,..- , @n—1) 
for the polynomial ao+a1£ +- :-+an-12”7! € F[x]/(x”—1). Hence, when we speak 

of the weight of an element in F[a]/(a” — 1) = F” we mean its weight with respect 

to this basis, i.e., the number of non-zero coefficients/entries in (ao, @1,.-. ;@n—1) = 

ao + aiz t+: + ania". 


w 


18.2. The recipe for a BCH code. Choose integers t,r with t < 2” — 1 and 
write n = 2”—1. We will construct an (n, k, d) linear code where d > 2t +1. It will 
detect 2t errors and correct t errors. The code is constructed in 3 steps. 


(1) Let K = For be the field with 2” elements. Fix some a € K such that 
K = {0} U Ope cea SA 


Such an a exists by Theorem 16.4. 
(2) The minimal polynomial of 3 € K is the non-zero polynomial m € F[a] of 
smallest degree such that m(8) = 0. Compute the minimal polynomials of 


a,a?,... ,œa%, and write m; for the minimal polynomial of at. Define 
g =|lem{m,... , Ma¢}- 
(3) The BCH code of length n and designated distance 2t + 1 is 
(9) F[z] 


@ —1) © @ -1 
We will show that g divides x” — 1 in section 18.4 so that (g) does contain 
(a” — 1). 


Theorem 18.2. The minimum distance of this BCH code is > 2t+1. 
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We will prove this in subsection 18.6. 
18.3. Minimal polynomials. 


Lemma 18.3. Let k C K be fields such that dim, K < co. Let m € k|x] be the 
minimal polynomial of an element 3 € K. Then m is irreducible. 


Proof. Since dim K < oo, the powers 1, 8, 8?,... are linearly dependent. Hence 
there is a non-zero polynomial f such that f(@) = 0. Let m be the minimal 
polynomial. If m = gh, then 0 = m() = g(8)h(8) so either g(8) = 0 or h(Z) = 0. 
It follows that m is irreducible. 


We have seen many times that if a and b belong to a field of characteristic p, then 
(a+b)? = a? + bP. It follows from this that if f is a polynomial with coefficients in 
K, then f(a?) = f(a). 


Lemma 18.4. Let k C K be fields such that dim, K < co. Let m € k[a] be the 
minimal polynomial of an element B € K. If the characteristic of K is p, then m 
is also the minimal polynomial of B’, BP”, pats 


Proof. If f € k[a], then f(3)? = f(8P), so f(G) = 0 if and only if f(8P) = 0. Hence 
6 and 8P have the same minimal polynomial. The other cases are similar. 


18.4. The g defined in section ?? divides x” — 1. Because the multiplicative 
group (K\{0},-) has order n = 2” — 1, the order of every element in it divides n. 
In other words, x” — 1 vanishes on all n elements of K\{0}. Since x” — 1 is in the 
kernel of the evaluation map f > f(@), the minimal polynomial of every 0 4 3 € K 
divides x” — 1. 

Hence the least common multiple of all such minimal polynomials must divide 
=k 


18.5. A (15,7,5)-code. Here n = 24 — 1 = 15, r = 4, and t = 2. We realize 
K = Fig as g 
F[e] 


(tt+t+1) 
The irreducibility of tt + t + 1 ensures that K is a field; to see that tt + t+ 1 is 
irreducible, observe that it has no linear factors because it has no zeroes in F = Fo, 
and it has no quadratic factors because the only degree two irreducible is t? +t +1 
and t4+t+14(t?+t+1)?. 

Let a = |t] denote the image of t in K. Straightforward computations give 


K = 


af=a+1 a8 =a? +1 al? =a +a? +a +1 
a=a a =a? +a a =a +a al = @? +a? +1 
a? = a? a& =a? +a? al =a? +a4+1 al4=o3+1 
a? =a al=aetatl a =a? +0? +a a = 1 


Let’s write m; for the minimal polynomial of a’. We want to find g € F[|z], the 
least common multiple of m1, M2,... , M4 = Mot- 
By definition of a, it is a zero of xf + æ + 1 which is irreducible, so 


my =x +r+l. 


Because we are working over a field of characteristic two, this is also the minimal 


polynomial of a”, at, a8, so mı = M2 = m4. 
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To find m3, the minimal polynomial of a, notice that m3 is also the minimal 


polynomial of af, a!?, a?4 = a, so 


ma(x) = (x — a”) (x — a®)(x— a? (x — a 
(x? _ (a3 +af)e + a?) (z? _ (a? +a? )z +a) 


=z- (č ta +a ta r 


II 


=gf +r’ +r trl. 
Hence 
g(a) = (£f +x + 1)(2f + r? +2? +2041) = r? +r Hr? tatl. 
Hence the BCH code is 
7 6 4 


FC (g = xz? +r" +x + zt +1) a F[z] x pls. 

(x15 — 1) (x15 — 1) 
Notice that the weight of the code word g is 5. To verify our claim that this is a 
(15,7, 5)-code we must show that every multiple of g, gh with degh < 8, has at 
least five non-zero coefficients. There are 28 — 1 such h’s so we surely do not want 
to do this by checking every case. The result is Theorem 18.2, which we now prove. 


18.6. Proof of Theorem 18.2. We must show that a non-zero codeword has 
weight > 2t + 1. In other words we must show that at least 2t + 1 coefficients of a 


non-zero 


f=ao+az+--++a,_-12"' € (g) 


are non-zero. Suppose to the contrary that the number of non-zero ajs is d < 2t. 

Since f is a multiple of the minimal polynomial of a’, 1 < i < 2t, Let aj,,... , Qja 
be the non-zero coefficients of f. f(a’) =0 for 1 < i < 2t. We can express this as 
the matrix equation 


1 1 1 
a a? a% 
2 4 4t 
(ao ay see an-ı) a 7 =0 
an1 a20- g2t(n=1) 
Let aj,,...,@;, be the non-zero coefficients of f. Taking the submatrix consisting 
of the first d columns and the rows labelled j1,... , ją we obtain 
Jj Jj 
att oi ... a% 
al? g&o ... %2 
(aj a dee aja) : : = 0. 
cyl aa |, atia 


It follows that the determinant of this d x d matrix is zero. However, that deter- 
minant is 

1 að (ah)? ... (qit)e} 

1 ai (af)? ... (al)! 


1 oft (afa)? ... (qiaya-t 
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However, it follows from the next lemma that there are two different values of 7 for 
which af: are the same; this contradicts the fact that a’, 1 < i < n are distinct. 
We conclude that the number of non-zero ajs must be > 2t + 1. 


18.7. The Vandermonde determinant. 
Lemma 18.5. If f(a1,... , £n) is an alternating homogeneous polynomial of degree 


in(n— 1), then 


F(LIs ox 5 By) = ce] [ («i =a) 
for some cE k. 


Proof. To say that f is alternating means that the value of f changes by a sign if 
the position of two of the variables is switched: 


FCG ey Li (Li SLA) ==] Ligkes Ui SER En)» 


Write f as a polynomial in xı and x2 with coefficients in k[z3,... , £n], say 
f= >D aiji x3. 
tj 
The alternating hypothesis says that 


X l A X ) iJ 
Qijti Lg = — Qij LiT 
i,j i,j 


SO Qij = —aji and we can now write 
=s AR wd VEN] 
f= X Qij (£123 — 1113). 
i<j 


However, there is a factorization 


h-E) 


=(y—a)a'(y + ryt? +--+ tyt) 


so we can write xi x3 — x1 xi = (x1 — £2)g for some polynomial g € k[x1, £2]. Hence 


f is divisible by x, — x2. We can repeat this argument for every pair of variables x; 
and x; to see that x; — x; divides f. It follows that f is a multiple of the product 


[[@- z) 
i<j 
of those factors. This product has (3) terms so has the same degree as f. The 
result follows. 


Theorem 18.6. The following formula holds for the determinant: 


1 1 ia 1 
Tı T2 Tn 
2 2 2 
x ax x = 
det 1 2 n |= [[@ — zj) 
i<j 
n-1 n—1 n—1 
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Proof. Write f(a1,...,%n) = det A for the determinant of this matrix; it is a 
polynomial in x,...,2,. Since the determinant is a sum of products, and each 
product is the product of a single term from each row, each term of f = det A has 
degree 


0414+244 (2-1) = 5nln—1) = o 


Because the determinant changes sign when two columns are switched, f is an 
alternating function. Now the only such polynomials are the scalar multiples of 
[],<;(%i — xj). However, by multiplying the diagonal terms, we see that the coef- 


n—1 
n 


ficient of 71273...2 is one. So the multiple must be one. 


19. IMPLEMENTING NEAREST NEIGHBOR DECODING 


Dot product on F”. There is a dot product in F”, namely 


n 


U:VU = y Uii 
i=l 


where u = (u1,... , Un) and v = (v1,...,Un). We say that u and v are orthogonal 
if u-v = 0. We write 

ut := {v € F” | u-v = 0}. 
If u is non-zero, u~ is a subspace of dimension one; this is because the map v > u-v 
is a surjective linear map F” — F (see Theorem 15.8). More generally, if D is any 
subset of F” we define 


D+ := {v € F” | u- v = 0 for all u € D}; 


ll 


this is a subspace of F”. 
Let €1,...,€n be the usual basis for F”, i.e., e; has a 1 in the iP position and 


zeroes elsewhere. Then 
1 ifi=j 
Cj“ ej = Gad . 
0 E, 


More about the Hamming (7, 4,3)-code. Let Ft S C c F” be the Hamming 
code. Claim: The vectors 


x = 0001111, y = 0110011, z = 1010101 


belong to C+. To verify this it suffices to check that x,y, and z are othogonal to a 
set of basis vectors for C. Now C was defined to be the linear span of the elements 


a = 1000011, b = 0100101, c = 0010110, d = 0001111 


It is routine to verify that x,y,z are orthogonal to a,b,c,d. Define the matrices 


001 
1000011 BE 
0100101 

GCG=|oo10110[ cad ee 
0001111 Do 

111 


49 


The orthogonality can now be expressed as saying that GE = 0. Since E is a 3 x 7 
matrix it gives a linear map F’ > F, u +> uE. Now G is the encoding matrix 
in the sense that the linear map Ft — F” given by v +> vG encodes the message 
words as code words. It is easy to see that the map F” > F’, u => uE is surjective 
(because the rows of E span F°). By Theorem 15.8, the kernel of this map therefore 
has dimension four; but the kernel contains C which is 4-dimensional, so the kernel 
is exactly C. In other words, u € C $ uE = 0; thus E provides a way of checking 
whether the received word is a valid code word. 
The next lemma shows that E does an awful lot more than that. 


Lemma 19.1. Consider the Hamming (7,4,3)-code Ft = C C F”. Suppose that a 
code word u is transmitted and an error occurs in the i® digit, so that v = u + e; 
is received. Then vE is the binary representation of the number e;. Thus vE tells 
us exactly which digit is wrong. 


Proof. Now vE = (u + e;)E = uE + e;E = e,E, so we simply need to check that 
for the seven basis vectors e; € F’, e;E is the binary representation of the integer 
i, i =1,...,7. Now eE is equal to the it row of E. Since the iP row of E is 
indeed the binary representation of the integer i the result follows. 


If you need more convincing try a few examples—impress your friends and family: 
let them pick any element u of C, change one digit, and you then tell them which 
digit they changed! 


All this is capable of generalization. Observe that G is of the form (I, | A) where 
I, is the 4 x 4 identity matrix and A is a 4 x 3 matrix. We call a k x n matrix of 
the form G = (Ip | A) a standard generator matrix. The map F} > F”, v > vG, is 
injective, so its image C is isomorphic to F*. Thus G gives rise to an (n, k)-linear 
code, F! S © = {vG | v € F} c F”. 


Theorem 19.2. Let G = (Ip | A) be a k x n standard generator matrix and define 


then x k matriz 
A 
e ( ). 
azk 


Then GH =0 and for each w € F”, 


(1) wH = 0 if and only if w € C := {uG | v € F*}; 
(2) if e € F” is the smallest weight word such that wH = eH, then the rule 
w= w — e is nearest neighbor decoding. 


Proof. We use the notation 


1 ifi=j 
dij = sake oa 
0 ILER 
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Notice that A is a k x (n — k) matrix. Let’s write A = (aj; )1<i<k,1<j<n—k- The 
ij-entry of GH is the product of the it? row of G with the j*® column of H, namely 


Qij 
akj = = 
(i Oi ai Qin—k) 5 = aij + aij = 0. 
1j 
On—k j 
Hence GH = 0. 


(1) Let 6: F” — F"-* be the linear map ¢(w) = wH. Because the identity 
matrix I,_, appears in the lower part of H, ¢ is surjective. But 
n = dim F” = dim(im ¢) + dim(ker ¢) 


so dim(ker ¢) = k. However, since GH = 0, and C is the image of the map F* > F”, 
vw vG, C C kero. Since C and ker ¢ have the same dimension they are equal. 
Thus wH = 0 if and only if w € C. 

(2) We must show that w — e is the codeword that is nearest to w. First, w — e 
is a codeword because (w—e)H = 0. Notice that {v € F” | vH = wH} = w+ ©; in 
particular, this is a coset of C. Thus, e is the element in w+ C of minimal weight. 
If u is a code word, then w — u € w + C, so W(w—u) > W (e). Hence 


d(w,u) = W(w — u) > W(e) = d(w, w — e). 


Hence u is no closer to w than w — e is. 


We call the H in Lemma 19.2 the parity check matrix for the code generated by 
G. 
Here is one way to implement this decoding algorithm. 
(1) Find an element in each coset w + C, w € F”, of minimal weight. Call this 
a coset leader. The cosets are the elements of F"/C = F"~*, so there are 
2”—-* different coset leaders. 
(2) Create a table of elements eH € F”~*, one for each coset leader e. 
) If w is received, compute wH, then look at the table to find the e such that 
eH = wH. 
(4) Decode w as w — e. 


Warning. Coset leaders are not necessarily unique: there might be several elements 
in a given coset w + C having the same minimal weight. For example, if e; + ej is 
a code word, then e; + C = ej + C, so either e; or ez could be chosen as a coset 
leader for e; + C. 

We call wH the syndrome of w. The table we create is called a syndrome look-up 
table. Here is what it is for the Hamming code. 


Syndrome Coset leader 


000 0000000 

001 0000001=e7 
010 0000010=e¢ 
011 1000000=e, 
100 0000100=e; 


101 0100000=e2 


110 
111 


0010000=e3 
0001000=e,4 
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